Current Bioinformatics

Author(s): Ke Dong and Jingyang Gao*

DOI: 10.2174/0115748936291075240409080924

DownloadDownload PDF Flyer Cite As
Validating the Distinctiveness of the Omicron Lineage within the SARS-CoV-2 based on Protein Language Models

Page: [257 - 265] Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Introduction: Variants of concern were identified in severe acute respiratory syndrome coronavirus 2, namely Alpha, Beta, Gamma, Delta, and Omicron. This study explores the mutations of the Omicron lineage and its differences from other lineages through a protein language model.

Methods: By inputting the severe acute respiratory syndrome coronavirus 2 wild-type sequence into the protein language model evolving pre-trained models-1v, this study obtained the score for each position mutating to other amino acids and calculated the overall trend of a new variant of concern mutation scores.

Results: It is found that when the proportion of unobserved mutations to observed mutations is 4:15, Omicron still generates a large number of newly emerging mutations. It was found that the overall score for the Omicron family is low, and the overall ranking for the Omicron family is low.

Conclusion: Mutations in the Omicron lineage are different from amino acid mutations in other lineages. The findings of this paper deepen the understanding of the spatial distribution of spike protein amino acid mutations and overall trends of newly emerging mutations corresponding to different variants of concern. This also provides insights into simulating the evolution of the Omicron lineage.

Keywords: Protein language models, SARS-CoV-2, Omicron, VOC, esm-1v, mutation.