What is it about?

In this article, we used genomic compressors to analyse viruses' genomes and identify them. We analysed 12,163 complete reference genomes from 9,605 viral taxa retrieved from the NCBI database and their compressibility. Using different compressor configurations, we can quantify internal genomic repetitions and inverted repetitions. This quantification provides rich information regarding the genomic characteristics of each group (type of genome, realm, genus, etc.). On the other hand, we can use minimal bi-directional complexity profiles to describe the structural formation, which is fundamental for understanding viral function. Finally, the measures taken from this study allowed us to perform organism identification with extremely high accuracy.

Featured Image

Why is it important?

This study is relevant because many relevant findings were made, such as: ⁘ On average, dsDNA viruses are the most redundant (less complex) according to their size, and ssDNA viruses are the less redundant. Contrarily, dsRNA viruses show a lower redundancy relative to ssRNA viruses. ⁘ We have found indications that virus some that infect extremophiles are more redundant and possess more IRs, indicating the presence of an adaptation to stabilize the genome in these environments. ⁘ An in-depth analysis of the human herpesviruses indicated that higher compressibility and abundance of inversions in herpesvirus might be associated with viral genome integration. ⁘ It is possible to provide the structural description of the viral genome using minimal bi-directional complexity profiles. ⁘ We could automatically and accurately distinguish between viral genomes at different taxonomic levels without using direct comparisons between sequences. This method is fast and could be extremely useful in metagenomics.

Perspectives

Although Kolmogorov Complexity and its approximations are not trendy right now, I hope people see their potential and possible applications in fields so relevant as virology and metagenomics. Furthermore, compression can play a crucial role in the structural analysis of the genome and organism identification, and I hope people can realise this while reading this paper.

Jorge Miguel Silva
Universidade de Aveiro

Read the Original

This page is a summary of: The complexity landscape of viral genomes, GigaScience, January 2022, Oxford University Press (OUP),
DOI: 10.1093/gigascience/giac079.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page