Viruses 11, 979 (2019). Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. We thank originating laboratories at South China Agricultural University (Y. Shen, L. Xiao and W. Chen; no. The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. Google Scholar. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. Biol. Lam, H. M., Ratmann, O. The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. Extended Data Fig. Evol. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. Specifically, progenitors of the RaTG13/SARS-CoV-2 lineage appear to have recombined with the Hong Kong clade (with inferred breakpoints at 11.9 and 20.8kb) to form the CoVZXC21/CoVZC45-lineage. NTD, N-terminal domain; CTD, C-terminal domain. performed codon usage analysis. However, on closer inspection, the relative divergences in the phylogenetic tree (Fig. Zhou et al.2 concluded from the genetic proximity of SARS-CoV-2 to RaTG13 that a bat origin for the current COVID-19 outbreak is probable. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. For the HCoV-OC43, MERS-CoV and SARS datasets we specified flexible skygrid coalescent tree priors. The origins we present in Fig. These authors contributed equally: Maciej F. Boni, Philippe Lemey. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. Ji, W., Wang, W., Zhao, X., Zai, J. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. =0.00075 and one with a mean of 0.00024 and s.d. Alternatively, combining 3SEQ-inferred breakpoints, GARD-inferred breakpoints and the necessity of PI signals for inferring recombination, we can use the 9.9-kb region spanning nucleotides 11,88521,753 (NRR2) as a putative non-recombining region; this approach is breakpoint-conservative because it is conservative in identifying breakpoints but not conservative in identifying non-recombining regions. The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. He, B. et al. Yuan, J. et al. In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. Provided by the Springer Nature SharedIt content-sharing initiative, Molecular and Cellular Biochemistry (2023), Nature Microbiology (Nat Microbiol) 206298/Z/17/Z. Trova, S. et al. This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . Li, X. et al. In this approach, we considered a breakpoint as supported only if it had three types of statistical support: from (1) mosaic signals identified by 3SEQ, (2) PI signals identified by building trees around 3SEQs breakpoints and (3) the GARD algorithm35, which identifies breakpoints by identifying PI signals across proposed breakpoints. Extended Data Fig. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. is funded by the MRC (no. Lam, T. T. et al. Bayesian evaluation of temporal signal in measurably evolving populations. All custom code used in the manuscript is available at https://github.com/plemey/SARSCoV2origins. Since experts have suggested that pangolins may be the reservoir species for COVID-19, the scaly anteater has been catapulted into headlines, news reports, and conversationsand some are calling COVID-19 "the revenge of the . In regionA, we removed subregion A1 (ntpositions 3,8724,716 within regionA) and subregion A4 (nt1,6422,113) because both showed PI signals with other subregions of regionA. PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. J. Virol. 84, 31343146 (2010). & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the Spike protein. 4 we compare these divergence time estimates to those obtained using the MERS-CoV-centred rate priors for NRR1, NRR2 and NRA3. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. In the variable-loop region, RaTG13 diverges considerably with the TMRCA, now outside that of SARS-CoV-2 and the Pangolin Guangdong 2019 ancestor, suggesting that RaTG13 has acquired this region from a more divergent and undetected bat lineage. Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. RegionB is 5,525nt long. =0.00025. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. Zhou, H. et al. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. 82, 48074811 (2008). The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. J. Med Virol. As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. T.T.-Y.L. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. The canine viral genome was excluded from the Bayesian phylogenetic analyses because temporal signal analyses (see below) indicated that it was an outlier. Bioinformatics 22, 26882690 (2006). Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. Evol. Split diversity in constrained conservation prioritization using integer linear programming. 3). Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. Unfortunately, a response that would achieve containment was not possible. Without better sampling, however, it is impossible to estimate whether or how many of these additional lineages exist. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. Below, we report divergence time estimates based on the HCoV-OC43-centred rate prior for NRR1, NRR2 and NRA3 and summarize corresponding estimates for the MERS-CoV-centred rate priors in Extended Data Fig. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. 95% credible interval bars are shown for all internal node ages. Proc. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. Zhou, P. et al. After removal of A1 and A4, we named the new region A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Concurrent evidence also proposed pangolins as a potential intermediate species for SARS-CoV-2 emergence and suggested them as a potential reservoir species11,12,13. 23, 18911901 (2006). Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. Aiewsakun, P. & Katzourakis, A. Time-dependent rate phenomenon in viruses. 1a-c ), has the third-highest number of confirmed COVID-19 cases in the state of So. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (1730-1958) to 1877 (1746-1986), indicating that these pangolin . Zhang, Y.-Z. When the first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on 10January 2020 (GMT) on Virological.org by a consortium led by Zhang6, it enabled immediate analyses of its ancestry. Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. Using the most conservative approach to identification of a non-recombinant genomic region (NRR1), SARS-CoV-2 forms a sister lineage with RaTG13, with genetically related cousin lineages of coronavirus sampled in pangolins in Guangdong and Guangxi provinces (Fig. Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Extensive diversity of coronaviruses in bats from China. Annu Rev. This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. T.L. 1 Phylogenetic relationships in the C-terminal domain (CTD). Using the most conservative approach (NRR1), the divergence time estimate for SARS-CoV-2 and RaTG13 is 1969 (95% HPD: 19302000), while that between SARS-CoV and its most closely related bat sequence is 1962 (95% HPD: 19321988); see Fig. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. Sliding window analysis of changes in the patterns of sequence similarity between human SARS-CoV-2, and pangolin and bat coronaviruses as described further in Fig. In Extended Data Fig. Nat. These differences reflect the fact that rate estimates can vary considerably with the timescale of measurement, a frequently observed phenomenon in viruses known as time-dependent evolutionary rates41,43,44. The authors declare no competing interests. 36, 17931803 (2019). . BFRs were concatenated if no phylogenetic incongruence signal could be identified between them. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. The command line tool is open source software available under the GNU General Public License v3.0. Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). and X.J. Our results indicate the presence of a single lineage circulating in bats with properties that allowed it to infect human cells, as previously described for bat sarbecoviruses related to the first SARS-CoV lineage29,30,31. Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. Virus Evol. Nature 503, 535538 (2013). Posada, D., Crandall, K. A. Because 3SEQ is the most statistically powerful of the mosaic methods61, we used it to identify the best-supported breakpoint history for each potential child (recombinant) sequence in the dataset. In this study, we report the case of a child with severe combined immu presenting a prolonged severe acute respiratory syndrome coronavirus 2 infection. 5). Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Abstract. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Phylogenetic trees and exact breakpoints for all ten BFRs are shown in Supplementary Figs. To examine temporal signal in the sequenced data, we plotted root-to-tip divergence against sampling time using TempEst39 v.1.5.3 based on a maximum likelihood tree. In the meantime, to ensure continued support, we are displaying the site without styles Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. Evol. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). We considered (1) the possibility that BFRs could be combined into larger non-recombinant regions and (2) the possibility of further recombination within each BFR. If the latter still identified non-negligible recombination signal, we removed additional genomes that were identified as major contributors to the remaining signal. Calibration of priors can be performed using other coronaviruses (SARS-CoV, MERS-CoV and HCoV-OC43), but estimated rates vary with the timescale of sample collection. This produced non-recombining alignment NRA3, which included 63 of the 68genomes. Based on the identified breakpoints in each genome, only the major non-recombinant region is kept in each genome while other regions are masked. Bioinformatics 30, 13121313 (2014). Holmes, E. C., Dudas, G., Rambaut, A. CAS Hu, B. et al. Boxes show 95% HPD credible intervals. Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. and D.L.R. 3). The histogram allows for the identification of non-recombining regions (NRRs) by revealing regions with no breakpoints. Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. performed Srecombination analysis. 82, 18191826 (2008). & Boni, M. F. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Evol. Are you sure you want to create this branch? Stegeman, A. et al. 6, eabb9153 (2020). 1. The virus then. B.W.P. 190, 20882095 (2004). Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. Ge, X. et al. Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Smuggled pangolins were carrying viruses closely related to the one sweeping the world, say scientists. A distinct name is needed for the new coronavirus. Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. 24, 490502 (2016). Nature 579, 270273 (2020). 27) receptors and its RBD being genetically closer to a pangolin virus than to RaTG13 (refs. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. Biol. Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. In our analyses of the sarbecovirus datasets, we incorporated the uncertainty of the sampling dates when exact dates were not available. These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. Virological.org http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331 (2020). Coronavirus: Pangolins found to carry related strains. Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. It is available as a command line tool and a web application. Biol. Li, Q. et al. In December 2019, a cluster of pneumonia cases epidemiologically linked to an open-air live animal market in the city of Wuhan (Hubei Province), China1,2 led local health officials to issue an epidemiological alert to the Chinese Center for Disease Control and Prevention and the World Health Organizations (WHO) China Country Office. And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Su, S. et al. Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. The web application was developed by the Centre for Genomic Pathogen Surveillance. PubMedGoogle Scholar. Holmes, E. C., Rambaut, A. Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis. Evol. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. CNN . Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? 91, 10581062 (2010). The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. Genetics 172, 26652681 (2006). Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. Conducting analogous analyses of codon usage bias as Ji et al. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. Our most conservative approach attempted to ensure that putative NRRs had no mosaic or phylogenetic incongruence signals. The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. Host ecology determines the dispersal patterns of a plant virus. Trends Microbiol. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. Concatenated region ABC is NRR1. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. The boxplots show divergence time estimates (posterior medians) for SARS-CoV-2 (red) and the 20022003 SARS-CoV virus (blue) from their most closely related bat virus. Mol. Region A has been shortened to A (5,017nt) based on potential recombination signals within the region. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. Google Scholar. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. & Bedford, T. MERS-CoV spillover at the camelhuman interface. Published. It is available as a command line tool and a web application. Using a third consensus-based approach for identifying recombinant regions in individual sequenceswith six different recombination detection methods in RDP5 (ref. Researchers in the UK had just set the scientific world . We thank all authors who have kindly deposited and shared genome data on GISAID. Extended Data Fig. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . 35, 247251 (2018). Article RegionC showed no PI signals within it. SARS-CoV-2 and RaTG13 are also exceptions because they were sampled from Hubei and Yunnan, respectively. SARS-CoV-2 is an appropriate name for the new coronavirus. We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. Centre for Genomic Pathogen Surveillance. eLife 7, e31257 (2018). is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. J. Infect. You are using a browser version with limited support for CSS. Researchers have found that SARS-CoV-2 in humans shares about 90.3% of its genome sequence with a coronavirus found in pangolins (Cyranoski, 2020). First, we took an approach that relies on identification of mosaic regions (via 3SEQ14 v.1.7) that are also supported by PI signals19. Nature 579, 265269 (2020). Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. (2020) with additional (and higher quality) snake coding sequence data and several miscellaneous eukaryotes with low genomic GC content failed to find any meaningful clustering of the SARS-CoV-2 with snake genomes (a). Sequences are colour-coded by province according to the map. Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. Syst. This dataset comprises an updated version of that used in Hon et al.15 and includes a cluster of genomes sampled in late 2003 and early 2004, but the evolutionary rate estimate without this cluster (0.00175 substitutions per siteyr1 (0.00117,0.00229)) is consistent with the complete dataset (0.00169 substitutions per siteyr1, (0.00131,0.00205)). Schierup, M. H. & Hein, J. Recombination and the molecular clock. Pangolin relies on a novel algorithm called pangoLEARN. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. Bioinformatics 28, 32483256 (2012). In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. A tag already exists with the provided branch name. Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. The plots are based on maximum likelihood tree reconstructions with a root position that maximises the residual mean squared for the regression of root-to-tip divergence and sampling time. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. 21, 255265 (2004). BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. & Li, X. Crossspecies transmission of the newly identified coronavirus 2019nCoV. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. 92, 433440 (2020). . The genetic distances between SARS-CoV-2 and Pangolin Guangdong 2019 are consistent across all regions except the N-terminal domain, implying that a recombination event between these two sequences in this region is unlikely. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). 110. Microbiol. Uncertainty measures are shown in Extended Data Fig. We extracted a similar number (n=35) of genomes from a MERS-CoV dataset analysed by Dudas et al.59 using the phylogenetic diversity analyser tool60 (v.0.5). 30, 21962203 (2020). A deep dive into the genetics of the novel coronavirus shows it seems to have spent some time infecting both bats and pangolins before it jumped into humans, researchers said . Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. Google Scholar. Due to the absence of temporal signal in the sarbecovirus datasets, we used informative prior distributions on the evolutionary rate to estimate divergence dates. J. Virol. & Andersen, K. G. The evolution of Ebola virus: insights from the 20132016 epidemic. PubMed Central Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. Nevertheless, the viral population is largely spatially structured according to provinces in the south and southeast on one lineage, and provinces in the centre, east and northeast on another (Fig.