Did you just hear about virus mutating?

6 minute read


First time?

Is it the year 2027/2033/2041 and the latest influenza/coronavirus/paramyxovirus is sweeping the globe? Are worrying reports about how the virus has mutated starting to crop up on your news feed? In this blog post I will try to explain why scary claims about viruses mutating are more likely to be overstated or even outright false.

What is a mutation?

I am not aware of any TV series, films or comic books where the word mutation isn’t synonymous with radical change. Shooting flames/ice/whatever out of your hands as a human is a pretty radical change from not shooting anything out of your hands but the reality of what the word “mutation” means in scientific context is far more mundane and much much closer to not shooting anything out of your hands.

Let’s begin with the simplest example and say that you have a picture which you wish to photocopy for a friend so you go and do that. The photocopy comes out imperfect - the image is less crisp and the corners blurry but overall you can still recognise what’s in the picture. The photocopy isn’t perfect because the process of copying isn’t perfect.

Every time your cells divide to renew tissues like skin or to produce your reproductive cells the first thing that happens is your genome inside the cell gets copied so that both daughter cells resulting from cell division contain a copy of the recipe for how to make the proteins that make up you. And you guessed it - the process of copying is imperfect here too.

The enzymes (proteins that catalyse reactions) that your cells use to copy your genome make a replication error roughly every 100 million sites (e.g. instead of an A the enzyme inserted a G during copying). The total length of your genome is closer to 3 billion sites, so you can already guess that every time your genome gets copied a large number of these errors are incorporated into the daughter cells’ genomes. Congratulations, all of your cells are mutants!

What’s different about viruses mutating?

The viruses most likely to make it to the news are emerging pathogens (because they are not constantly with us) and RNA viruses are particularly overrepresented amongst viruses that jump between different species. RNA viruses use a slightly different genetic material to encode their genomes - instead of DNA they use RNA which requires markedly different (but still distantly related) enzymes to copy it and unlike the enzymes that replicate genomic DNA (DNA polymerases) RNA polymerases do not (in the vast majority of cases) proofread what they just copied. Without proofreading RNA viruses the error rate becomes orders of magnitude higher with mutations happening once every 10,000 to 100,000 sites.

As Eddie Holmes put it in his book The Evolution and Emergence of RNA Viruses the high mutation rates of RNA viruses are possibly their most important defining characteristic. Having a high mutation rate means that RNA viruses cannot afford to have a long genome with a rich protein repertoire because every replication cycle would introduce more mutations that break things than mutations that are neutral or beneficial. The evolutionary corner you’re backed into this way is necessarily having a small genome - if the mutation rate is high the optimal solution is to not give the polymerase a lot of chances to introduce errors.

Constraints of viral evolution

As you can imagine viral infections are a terrible thing for the host and therefore natural selection has yielded remarkable and diverse defenses against viral infection. Because vertebrates tend to live longer on average than our non-vertebrate cousins natural selection has endowed us with the adaptive immune system which is basically our very own lab bench where we recreate the process of evolution using our own genomes.

Why mutations don’t affect how things behave (that much)

A very basic introduction to molecular biology

Copying your genome involves unzipping your double-stranded helix and using the sequence of nucleotides (four types molecules found in DNA that are referred to by the letters of the alphabet - A, C, T, and G) on either helix to determine what the full replicated double helix should look like. It can be done because every nucleotide at a particular position (referred to as a site) can only pair with one partner - an A can only pair with T and a C can only pair with G. If one helix has an A at position 1 then the same position in the second helix must have been a T.

A minute fraction of your genome - a little over 1% - encodes recipes for making proteins. Proteins are a bit like DNA - they are a linked sequence of basic building block molecules (amino acids) only instead of having 4 different kinds of nucleotides to occupy a given site like DNA proteins are built from 20 different kinds of amino acids. The 20 amino acids very in their chemical properties and the backbone on which proteins sit is free to fold whichever way it likes (unlike the double-helix of DNA). In fact most proteins need to fold into a particular shape such that in some portions of them there will be a 3D region where amino acids with the right chemical properties can coordinate other molecules to as to make chemical reactions happen more efficiently.

How does one encode 20 amino acids in proteins from just 4 nucleotides in DNA? The solution is simple - each amino acid is encoded by a sequence of three nucleotides (called codons). If you are using three nucleotides to encode a single amino acid you end up with a surplus of information because there are a total of 64 possible triplets of nucleotides (4 at the first site multiplied by 4 at the second and multiplied by 4 at the third or 4^3) but only 20 amino acids that you can use. Rather than using just 20 codons for every amino acid and ignoring the rest evolution ended up introducing redundancy into the system - two to four codons (depending on the amino acid) can code for the same amino acid and three codons are usually used to denote the end of the protein (the “stop” codons).