NSF Computing Innoation Fellow
Dept. of Computer Science
University of California Davis
January 7-11, 2012
Hyderabad, India
Abstract: Sequence comparison is one of the most fundamental computational problems in bioinformatics. Pairwise sequence alignment methods align two sequences using a substitution matrix consisting of pairwise scores of aligning different residues with each other (like BLOSUM62), and give an alignment score for the given sequence-pair. This research addresses the problem of accurately estimating statistical significance of pairwise alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequencespecific. Specifically, we develop algorithms for sequence-specific strategies for hardware acceleration of pairwise sequence alignment in conjunction with statistical significance estimation. Using pairwise statistical significance has been shown to give better retrieval accuracy compared to database statistical significance reported by popular database search programs like BLAST and PSI-BLAST. We provide a ‘flexible array’ hardware architecture which provides a scalable systolic array suitable for both long ans short sequences. The results with Xtremedata XD1000 FPGA platform show a speed-up by a factor of more than 200.