My first steps in metagenomics: a story in three parts

4 minute read


It’s been over two years since I wrote one of these blog posts but I swear I have a good reason. In addition to the SARS-CoV-2 pandemic derailing many things along the way I’ve been waiting until the last paper from a series of three focused on a particular dataset I got to work with came out so I could tell a cohesive story. This will be a series of three blog posts covering each paper - how they came about, what our process and the main findings were and what I think they tell us. But first - some background on all three.


In July of 2018 I moved back to Europe after my two-and-a-bit year stint in Seattle I wrote about before. Later that same year my now spouse and I moved to Gothenburg, Sweden where my partner was hired as a postdoc in the extended Antonelli lab. Before I left Seattle, however, I went on a little trip to California in search of remote employment opportunities - I’ve learned from the talented Sidney Bell, a PhD student in the Bedford lab at the time, about Chan Zuckerberg Biohub who were looking for people to help out with a variety of ongoing projects. During a call with my future co-author Joshua Batson I was told about three or so projects Biohub was working on at the time that were up for grabs but my interest was piqued by the very first project he mentioned.

It was metagenomics. Metagenomics had been a secret passion of mine during my PhD and I already had a small taste of it while in Darren Obbard’s orbit in Edinburgh - mainly through discussions over coffee and beers but also by helping him out here and there. Getting a chance to do it full-time was a dream come true and having read a number of revolutionary metagenomics papers during my PhD I definitely had some ideas of my own. It was time to put them to the test.

The California mosquito dataset

By the time I showed up, Chan Zuckerberg Biohub had been sitting on a dataset of 148 individually RNA-sequenced mosquitoes from California that were caught in 2017. The initial idea from what I gathered is that this was a pilot dataset meant to see if it’s possible to detect human-infecting pathogens (like West Nile virus) in mosquitoes (or their bloodmeal hosts) before human cases start appearing. While it failed to do so, I think the things we’ve learned along the way offer similar sorts of glimpses into the future as what happened with the 2013-2016 West African Ebola virus epidemic - research is moving in a certain direction that is both very promising and undeniably better over what’s been done before. I feel like the Californian mosquito dataset was a small landmark in metagenomics and the ripples it’s made so far certainly seem to confirm my suspicions.

The three papers

The three papers I’ve alluded to are:

  • Batson et al. (2021) is the major study describing the Californian mosquito dataset. Being a part of it was one of the more pleasant experiences of my career though the study suffered greatly - directly and indirectly - because of the COVID-19 pandemic.
  • Dudas et al. (2021) is a small study largely led by a team of Biohub and Biohub-affiliated mathematicians that started because of the Californian mosquito dataset and though my contribution to it was fairly modest, I think it highlights a few interesting questions in metagenomics.
  • Dudas & Batson (2023) is a more extensive study that was long overdue and which was a springboard for my lab at Vilnius University’s Life Sciences Center. As I describe it it’s both a love letter and a mission statement.

Get yourself a cuppa and let’s jump right in, shall we? Which blog post shall we start with? Batson et al. (2021), Dudas et al. (2021), or Dudas & Batson (2023)?