George L. Coulouris (2010). BASALT: Blocked Alignment Score Approximation in Linear Time http://coulouris.org/basalt/
BASALT is a scalable, bandwidth-bound algorithm to detect regions of similarity in large nucleotide sequences.
Currently, the best observed speedup is around 46x:
USE_ZEROCOPY=0 BASALT_THREADS_PER_BLOCK=64 BASALT_BV_SIZE=2048 BASALT_NUM_QUERY_TILES=512 BASALT_NUM_SUBJECT_TILES=512 device 0 = Tesla C1060 gpu processing time : 59.300999 (ms) memory bandwidth : 67.452490 (GB/s) cpu processing time : 2752.378906 (ms) Test PASSED