Part 2: the paper with ambigrams and mathematicians
Published:
This story will be a bit shorter than the others largely because I don’t feel like I played a huge role in formulating the central argument of the Dudas et al. (2021) paper. I did provide some of the crucial data and actually ended up being first author because mathematicians/physicists (i.e. all of my co-authors on this paper) list authors alphabetically. How did I end up on a mathematician-led paper? Great question, it’s all got to do with ambigrammaticity.
Background
In the previous blog post I briefly mentioned Culex narnavirus 1 (CxNV1) and it catching my eye because of the two entirely overlapping open reading frames (ORFs) running in opposite directions across the length of (what we thought to be) the genome - one of them being a recognisable RNA-dependent RNA polymerase (RdRp) and its evil reverse twin ORF (per tradition) not resembling anything on NCBI. This bizarre arrangement intrigued Michael Wilkinson, a Chan Zuckerberg Biohub-affiliated mathematician resulting in a paper coining the term “ambigrammatic” for heavily overlapping ORFs running in opposite directions and showing that codons in ambigrammatic ORFs must be aligned - codon positions 1, 2, and 3 in the forward ORF are positions 3, 2, and 1 in the reverse ORF and not 4, 3, 2 or 5, 4, 3. Another neat feature of this finding is that it explains how anything (even you!) can evolve ambigrammatic ORFs - you take a normal ORF and then remove stop codons from its complementary sequence which you can do with synonymous changes alone.
As I mentioned in the other post on Batson et al. (2021) we had identified CxNV1 as having a previously unrecognised segment. In a case of onomatopoeia applied to the term narnavirus this new segment was called Robin and some other work done by Hanna Retallack showed that 1) predictably, reverse ORFs are probably not translated and 2) CxNV1 is happy to exist without Robin. Michael and team wanted to continue working with CxNV1 and ask how we could tell what Robin might be but to proceed any further required biology knowledge they did not possess. Around November 2020 is when I was invited to join the party.
My contribution(s)
The initial hypothesis Michael and co were going with was that none of Robin’s ORFs were translated. I was immediately suspicious of this, having designated the forward and reverse directions of Robin on the basis of conservation (ORF with fewer amino acid changes being designated forward) which to me implied constraint on a functional protein. This argument eventually escalated into the conservative dN/dS analysis that made it into the paper. Conservative because we know CxNV1 can recombine meaning phylogenetic methods of finding dN/dS could be compromised so I chose to compute dN/dS based only on unique changes seen in the alignment. Predictably, RdRp had a very low dN/dS and its evil twin ORF a very high dN/dS. Qualitatively Robin showed similar results just less extreme.
In pursuit of another question - whether Robin was a newly evolved segment of CxNV1 specifically or a previously unrecognised genomic feature of many narnaviruses related to CxNV1 - we had to find more Robins in other narnaviruses. How do you find something that doesn’t resemble anything else on public databases? Where do you start looking? At the time (and probably to date) the only reasonable choice was going to be Zhejiang mosquito virus 3 - ZhMV3. Found in four separate mosquito samples at the time (and some more since), it’s the only narnavirus that came to mind to apply our co-occurrence technique to. All the samples it was found in were pooled individuals so the sequence data were messy with many contigs co-occurring with the ambigrammatic ZhMV3 RdRp but we had a few reasonable expectations of a true segment - no BLAST hits, not too long, and ambigrammatic. Soon enough we had a candidate. With a quick Pebblescout check just now I still think it’s a good candidate.
With the power of replication repeating the same dN/dS analysis on ZhMV3 RdRp and Robin showed a similar pattern of constraint and so I think it’s still reasonable to expect that narna Robin segments code for a translated protein. Michael and co contributed their own arguments to the manuscript and we had something decent to publish.
Concluding
So what does this CxNV1 (and now ZhMV3) story tell us? I think there’s three main takeaways I have from this project. First, I don’t think it’d be too controversial to say that our knowledge of segmented RNA virus groups (sometimes even knowing they’re segmented) is seriously compromised. We’ve been getting by (note - just getting by) quantifying RdRp diversity based on primary sequence similarity and the occasional hidden Markov model (HMM) because it’s convenient and informative of some things but then potentially missing out on genes that could be the difference between life or death for the host. We should do better and I think the field is slowly coming around to doing individual host transcriptomes in recognition of this method’s power and long-term utility for other groups.
Second, I think there’s a lot to be said about the power of natural selection. It still fascinates me that we can look into the non-random survival or randomly perturbed strings of nucleotide and find (testable) meaning in them. We can make well-reasoned guesses about whether a sequence is translated, sometimes what portions of it are likely to be doing something important and occasionally all of this without having any prior knowledge about a given sequence.
And finally - what a world of mysteries! What’s up with narnavirus ambigrammaticity in the first place? Currently we guess reverse ORFs facilitate some sort of interaction with host ribosomes. What about Robin? What does that do? Is it suppressing the host immune response? Does it form particles? Does it fuse cells and allow these viruses to transmit that way? Is it associated with ambigrammaticity? Does it do something we’ve not even thought about? Mysteries abound but we’re slowly chipping away at them.