Question

# Section C – Mutation of SARS-CoV-2 Over Time All viruses accumulate mutations over time, and different...

Section C – Mutation of SARS-CoV-2 Over Time

All viruses accumulate mutations over time, and different classes of viruses have different mutation rates based on the fidelity with which their genomes are replicated. Viral mutations can affect the accuracy of tests, the efficacy of vac- cines, and virulence (the ability to spread, the seriousness of illness, etc.). In this section you will compare a number of sequenced genomes from clinical samples during the first several months of the COVID-19 outbreak to the RefSeq genome for SARS-CoV-2, to identify the number of mutations that have occurred and calculate an approximate mutation rate. (Note: In reality, mutation rates are not calculated this way; the math is much more complex. Usually viral rates are calculated after controlled experiments in tissue culture in the lab, with algorithms that account for many factors, including types of mutation, selection, etc. Calculations more similar to what you will be doing can be done based on extensive phylogenies, calculating time since divergence, etc. You are going to try to get a rough idea of how frequently the SARS-CoV-2 virus is mutating, based on simple comparisons and some simple assumptions, while continuing to work with GenBank and BLAST).

For this analysis, you will again use the NCBI virus database. The earliest samples in the database were collected from patients in late December, 2019 in China. As previously mentioned, one of these has been adopted as the “RefSeq” genome, the standard that others are compared to. Although an exact collection date is not listed, you will use

12/23/2019, the earliest date in the database. You will compare its sequence to others from different dates and loca- tions in the first few months of the pandemic. You are going to assume that this is the virus from which all the others arose (in reality there is a more ancestral strain that was never sequenced, as evidenced by phylogeny, but this assump- tion will work for our rough estimate purposes).

There are some things to be aware of when considering the data. There are some whole genome sequences and some smaller partial gene sequences. We will be comparing whole genome sequences, but even within those, there are dif- ferences in the total number of bases reported due to differences on the ends outside of the genes (untranslated regions or artifacts of amplification and sequencing). We will therefore use the smaller of the sequences being compared to determine how many bases were (presumably) aligned between the two sequences. Also, the sequences are listed by release date (when the sequence was put in the database), but we are more interested in the collection date (when the sample was taken from the patient), which is a separate column and can also be accessed by clicking on the accession number.

Remember that the mutation rate for viruses is calculated as substitutions/nucleotide/cell infection. You are going to estimate the mutation rate by measuring two things and making some assumptions about mutations and transmission.

The number of substitutions will be the number of differences between the two genomes. The number of nucleotides will be the number of bases that aligned between them (we will use the size of the shorter of the two genomes). For the cell infections, we are going to make a large assumption about viral transmission. We are going to estimate transmis- sion events (how many times it jumped from one person to another).

Why are transmission events not the same as cell infections? Which number is expected to be higher, transmission events or cell infections? Will this lead to an over- or an underestimation of mutation rate in our calculations?

The transmission events differ largely from the cellular infection because the transmission depends on the contact and the cell infection depends on the immune system of the individual.With the virus that is being spread through the contact, the level fo transmission will be much higher as compared to the level of infectiousness. The virus required appropriate cellular conditions for survival and replication and then infections. The virus may be present in the individual but not potray any symptoms but still be present and hence may be transmitted via the contact.

This level difference may lead to an underestimation of the mutation rate as the infections may not be symptomatic but the survival of the virus in every individual may differ as per the survival conditions.

#### Earn Coins

Coins can be redeemed for fabulous gifts.