Welcome to behind the scenes

3 minute read

Published:

I’ve decided to launch my website with a short “behind the scenes” look at the most recent paper on MERS-CoV, which has recently been published in eLife.

Motivation

The goal of the mers-structure project was to understand the epidemiology of MERS-CoV epidemiology. It began through a combination of a strong argument about MERS-CoV epidemiology, contentious findings by other groups (from both case-based and sequence-based studies), wanting to learn BEAST2 (specifically Tim Vaughan’s structured coalescent implementation), and seeing a publication niche that wasn’t occupied. The timing could have been ever so slightly better, but when we started a sufficiently large number of MERS-CoV genomes sequenced from camels were already available on GenBank.

Progress

Like many other projects I’ve worked on, this MERS study went through a number of research digressions, bursts of activity, and periods of inactivity. It started towards the end of 2016 summer and took well over a year from starting to publishing. During 2016 I was helping out with a review on Ebola virus evolution still finishing up the big Ebola study, and got involved with the Zika in Florida study, so not exactly wasting my time. In addition, structured coalescent models mix sloooooooow, so I usually ended up setting up runs that would take weeks and going away to work on something else. I still think that the slow approach to doing projects is the way to go (with some exceptions), because it allows to flesh out ideas, cover your bases, and get the project to a point where you’re happy with it. My influenza B study took a similarly long period of time and is still one of my favourite projects.

Reviews

We submitted the MERS-CoV manuscript to eLife in early September 2017 instead of August because of my uncanny ability to disappear to Lithuania for holidays when I’m most needed. Reviews took a while, but were with us by mid-October and were very positive. Erik Volz and Cristophe Fraser were two of the three reviewers and suggested a number of improvements which we were actually happy (rather than reluctant) to implement. Thanks to reviewer comments we ended up with extra figures of MERS-CoV trees reconstructed with the classic CTMC approach and structured coalescent with enforced equal deme sizes (neatly demonstrating where our inference power was coming from), as well as using different statistics in our ABC-like approach for R0 inference.

What I’ve learned

One of the most valuable things I’ve adopted during this project are IPython cell and line magics available in Jupyter notebooks. The ability to call other programs from inside the notebook environment adds another layer of reproducibility, much like XMLs do for BEAST. Sure beats my previous approach of keeping a text editor open with commands I use frequently.

Structured coalescent and its approximations are very promising approaches to phylogenetic inference for specific problems and data situations. I’m aware of at least one promising approach under development in Tanja Stadler’s group. Another indirect advantage of having to work with multitype trees (phylogenies with single-child nodes) during this project is that baltic had to be modified to deal with the different data structure.

Especially where inference is involved someone else will probably have done and done it better. Largely because of my more evolutionary rather than epidemiological background I wasn’t aware (enough) of packages like PhyDyn which we could have used from the start to do R0 inference properly rather than via the rather clunky ABC-like Monte Carlo approach.

Julia might actually be as fast as advertised (under some conditions). I’ve previously expressed skepticism about Julia’s speed and wouldn’t be surprised if my python kung fu isn’t up to scratch when it comes to hardcore heavy-lifting NumPy-based computing, but the simulation code I wrote in Julia was both easy to write and ran pretty fast.