 |
Special Topics in CS: Bioinformatics
Homework 4
|
Exploring Sequences from the SARS Virus
We will use information from the SARS genome as a vehicle for experimenting with two pairwise alignment programs. More general information about the SARS virus is available at the following sites:
Commentary from The Lancet
Images and summary
Information from the World Health Organization (WHO)
Science publishes genome
SARS reference
Information from the Centers for Disease Control and Prevention (CDC)
Homework questions (Due October 1, 2003):
- The complete genomes of several isolates of the SARS virus are available at NCBI. Read the general information about the SARS virus at NCBI entitled Severe Acute Respiratory Syndrome (SARS) and use this information to answer the following questions. When was the first case of SARS reported? Where were the first cases? Where was the first genome of SARS sequenced?
- We will use alignment tools to compare some proteins from SARS and other coronaviruses. These viruses have very small genomes that code for a small number of proteins. One type of protein is a structural "spike" protein. SARS viruses have been isolated from a number of sources (each called an isolate). At NCBI find the entry for the spike protein with Accession number AAP50485. The entry contains a lot of information about the protein. What was the isolation source of the virus this protein comes from?
The protein sequence is listed at the bottom of the entry. This sequence is listed in a format that is easy to read called the GenBank Sequence format. Most sequence analysis software requires a format that is easier to be computer processed. A commonly used format is the FASTA format. This format is described on page 31-32 of your text. You can request that NCBI change the format to FASTA by replacing the "default" entry for "Display" at the top of the page with FASTA and then clicking on the "Display" button. Make sure you can display the spike protein sequence in FASTA format. Give the FASTA format for the protein in your homework.
Now it is possible to cut this sequence (including the line beginning with ">") and paste it into an alignment program. For this first question, we will use the pairwise alignment software from Michigan State.
Michigan State Sequence Analysis Server . Use the GAP global alignment program to align the sequences for the spike proteins with the following Accession numbers: AAP50485 and NP_828851. Make sure that you select "protein" rather than the "DNA" default. How similar are these proteins? Elaborate.
- Use global alignment again to align one of the spike proteins from SARS (NP_828851) and porcine epidemic diarrhea virus (NP_839968). Describe your results. Does global alignment provide an appropriate method for comparing these two proteins? Why or why not?
- Repeat the previous question using the SIM local alignment tool from Michigan State. Note that SIM will give several different alignments. You can ignore all but the first when you answer these questions. How do the results using SIM compare with those from the global alignment?
- What is the default gap penalty that is used by SIM? What is the default amino acid substitution matrix?
- Researchers reported that the structural proteins (such as the spike proteins) of coronaviruses from different sources were less similar than the enzymatic proteins. A helicase is an enzyme that separates the strands of DNA. Many viruses use a helicase to "nick" the DNA of their host and start their replication process. Use local alignment to compare a SARS helicase (NP_828870) and a porcine epidemic diarrhea helicase (NP_839966). Describe the results.
- Compare the two helicases again with SIM using different amino acid substitution matrices. What sort of differences do you observe when you change matrices. Which do you think is most appropriate? Why?
- Compare the two helicases using global alignment and vary the gap penalties. Describe differences that you see.
Due:
Submit a hard copy of a report containing answers to the questions above by the beginning of class on Wednesday October 1, 2003. The homework should be typed in a word processor. You will probably want to paste results alignment results from web pages into the homework. Do NOT submit this homework via email. You should answer the questions completely in carefully written English.