CoV2K: A Knowledge Base of SARS-CoV-2 Variant Impacts

Ruba Al Khalaf,Tommaso Alfonsi,Stefano Ceri,Anna Bernasconi
DOI: https://doi.org/10.1007/978-3-030-75018-3_18
2021-01-01
Abstract:In spite of the current relevance of the topic, there is no universally recognized knowledge base about SARS-CoV-2 variants; viral sequences deposited at recognized repositories are still very few, and the process of tracking new variants is not coordinated. CoV2K is a manually curated knowledge base providing an organized collection of information about SARS-CoV-2 variants, extracted from the scientific literature; it features a taxonomy of variant impacts, organized according to three main categories (protein stability, epidemiology, and immunology) and including levels for these effects (higher, lower, null) resulting from a coherent interpretation of research articles.CoV2K is integrated with ViruSurf, hosted at Politecnico di Milano; ViruSurf is globally the largest database of curated viral sequences and variants, integrated from deposition repositories such as COG-UK, GenBank, and GISAID. Thanks to such integration, variants documented in CoV2K can be analyzed and searched over large volumes of nucleotide and amino acid sequences, e.g., for co-occurrence and impact agreement; the paper sketches some of the data analysis tests that are currently under development.
What problem does this paper attempt to address?