COVID-19 Patient Zero: Data Analysis Identifies the “Mother” of All SARS-CoV-2 Genomes

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor.

Temple researchers have identified the first genome to transmit the coronavirus.

In the field of molecular epidemiology, the worldwide scientific community has been sleuthing to solve the riddle of the early history of SARS-CoV-2.

Since the first SARS-CoV-2 virus infection was detected in December 2019, tens of thousands of its genomes have been sequenced worldwide, revealing that the coronavirus is mutating, albeit slowly, at a rate of 25 mutations per genome per year.

But despite major efforts, no one to date has identified the first case of human transmission, or “patient zero” in the COVID-19 pandemic. Finding such a case is necessary to better understand how the virus may have jumped from its animal host first to infect humans as well as the history of how the SARS-CoV-2 viral genome has mutated over time and spread globally.

“The SARS-CoV-2 virus is carrying an RNA genome that has already infected more than 35 million people across the world,” said Sudhir Kumar, director of the Institute for Genomics and Evolutionary Medicine, Temple University. “We need to find this common ancestor, which we call the progenitor genome.”

This progenitor genome is the mother of all SARS-CoV-2 coronaviruses infecting people today.

In the absence of patient zero, Kumar and his Temple University research team now may have found the next best thing to aid the worldwide molecular epidemiology detective work. “We set out to reconstruct the genome of the progenitor by using a big dataset of coronavirus genomes obtained from infected individuals,” said Sayaka Miura, a senior author of the study.

They found the “mother” of all SARS-CoV-2 genomes and its early offspring strains have subsequently mutated and spread to dominate the world pandemic. “We have now reconstructed the progenitor genome and mapped where and when the earliest mutations happened,” said Kumar, the corresponding author of a preprint study.

In doing so, their work has provided new insights into the early mutational history of SARS-CoV-2. For example, their study reports that a mutation of the SARS-CoV-2 spike protein (D416G), often implicated in increased infectivity and spread, occurred after many other mutations, weeks after COVID-19 started. “It is nearly always found alongside many other protein mutations, so its role in increased infectivity remains difficult to establish,” said Sergei Pond, a senior co-author of the study.
Besides their findings on SARS-CoV-2’s early history, Kumar’s group has developed mutational fingerprints to quickly recognize strains and sub-strains infecting an individual or colonizing a global region.

Order to a pandemic

To identify the progenitor genome, they used a mutational order analysis technique, which relies on a clonal analysis of mutant strains and the frequency in which pairs of mutations appear together in the SARS-CoV-2 genomes.

First, Kumar’s team sifted through data on almost 30,000 complete genomes of the SARS-CoV-2, the virus that causes COVID-19. Altogether, they analyzed 29,681 SARS-CoV-2 genomes, each containing at least 28,000 bases of sequence data. These genomes were sampled between 24 December 2019 and 07 July 2020, representing 97 countries and regions worldwide.

Many previous attempts in analyzing such large datasets were not successful because of “the focus on building an evolutionary tree of SARS-CoV-2,” says Kumar. “This coronavirus evolves too slow, the number of genomes to analyze is too large, and the data quality of genomes is highly variable. I immediately saw parallels between the properties of these genetic data from coronavirus with the genetic data from the clonal spread of another nefarious disease, cancer.”

Kumar’s group has developed and investigated many techniques for analyzing genetic data from tumors in cancer patients. They adapted and innovated those techniques and built a trail of mutations that automatically traces back to the progenitor.  “Basically, the genome before the first mutation was that of the progenitor,” said Kumar. “The mutation tracking approach is beautiful and predicts a phylogeny of “major strains” of SARS-CoV-2. It is a great example of how big data coupled with biologically-informed data mining reveals important patterns.”

Open chat
How can Help You?
Welcome to the UAE Genetic Diseases Association website.
How can I help you?