In Lab 7 we ask you to do two things:
- ToyBLAST. You will refine your toy BLAST program. If you didn't get everything working in the Lab 6 part of the that program, continue working on that as well as the new parts. If you absolutely cannot get Lab 6 finished and working, and hence can't continue of with Lab 7, let us know, and we will give you some parts of the code for Lab 6. But it is best if you do it all yourself.
- Multiple Sequence Alignment. You will learn how to use one of the most popular global multiple alignment algorithms out there, ClustalX, and (optional) you will use a homegrown multiple alignment program, star.pl that we will discuss in class.
In addition, if you have not already, please also read the third notes on Perl distributed about two weeks back. Please let me know if you find any errors or ambiguities in those notes.
What you need to turn in are your answers to the questions and exercises below. For Perl programs use script to print out the program and show how it runs on data.
Your first task is to walk through the following practical tutorial on ClustalX found at ClustalX Tutorial
Download the file aligned globins That shows a polished, hand optimized, multiple alignment of many globin sequences.
Download the file packed globins which has the same sequences as the globins file, but with the spaces and names removed.
Now adapt the packed globins data so that it can be used as input to clustalx, and use clustalx to get a multiple alignment of those sequences. How does it compare to the original one in the aligned globins file?
Now download the program star.pl star.pl that we will use to multiply align the sequences.
You will also need a weight matrix weight.txt when you run the program star.pl. Download that matrix from weight.txt
Run this multiple alignment program with the packed globins.
The multiple alignment will have to be cut out from among other output. Get it and cut it out and save it to a file. How does it compare to the multiple alignment we started from in file aligned globins (just by eye-balling the alignments), and the alignment produced by Clustalx? Take note of the ratio of the optimal pairwise to the induced scores produced by star.pl.
In the star.pl program, you are initially asked for a center number and told to use the mini center first. After the first multiple alignment is found, the program gives you the option to specify another center and find the resulting multiple alignment. Just do this a couple of times, picking centers other than the mini center. Each time, see how well the resulting multiple alignment matches the original alignment in globins, and what the ratio of optimal to induced scores is. Record these ratios, and state your conclusions.
How does this program work?