2020-08-26
Fast whole-genome phylogeny of the COVID-19 virus SARS-CoV-2 by compression
Publication
Publication
We analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny.
Additional Metadata | |
---|---|
doi.org/10.1101/2020.07.22.216242 | |
Organisation | Algorithms and Complexity |
Cilibrasi, R., & Vitányi, P. (2020). Fast whole-genome phylogeny of the COVID-19 virus SARS-CoV-2 by compression. doi:10.1101/2020.07.22.216242 |