Smith-Waterman in CUDA functioning. Each thread gathers four E and H values from the load buffer (first read operation) and four values from the profile (second read operation: the four values are packaged in a single unsigned integer of the query-profile). The Smith-Waterman algorithm is then applied to these data and the results are saved in the storing buffer (a single write operation). The alignment also requires two supplementary values: an f_north and an h_north. In this case, there is no need to save an entire column, but only two temporary numbers updated at each cell computation.
Manavski and Valle BMC Bioinformatics 2008 9(Suppl 2):S10 doi:10.1186/1471-2105-9-S2-S10