A novel SARS-CoV-2 viral sequence bioinformatic pipeline has found genetic evidence that the viral 3 ' untranslated region (UTR) is evolving and generating increased viral diversity
Author
dc.contributor.author
Farkas, Carlos
Author
dc.contributor.author
Mella, Andy
Author
dc.contributor.author
Turgeon, Maxime
Author
dc.contributor.author
Haigh, Jody J.
Admission date
dc.date.accessioned
2021-12-02T14:43:10Z
Available date
dc.date.available
2021-12-02T14:43:10Z
Publication date
dc.date.issued
2021
Cita de ítem
dc.identifier.citation
Frontiers in Microbiology June 2021 Volume 12 Article 665041
es_ES
Identifier
dc.identifier.other
10.3389/fmicb.2021.665041
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/183018
Abstract
dc.description.abstract
An unprecedented amount of SARS-CoV-2 sequencing has been performed, however,
novel bioinformatic tools to cope with and process these large datasets is needed.
Here, we have devised a bioinformatic pipeline that inputs SARS-CoV-2 genome
sequencing in FASTA/FASTQ format and outputs a single Variant Calling Format file that
can be processed to obtain variant annotations and perform downstream population
genetic testing. As proof of concept, we have analyzed over 229,000 SARS-CoV-2
viral sequences up until November 30, 2020. We have identified over 39,000 variants
worldwide with increased polymorphisms, spanning the ORF3a gene as well as the
30 untranslated (UTR) regions, specifically in the conserved stem loop region of SARSCoV-
2 which is accumulating greater observed viral diversity relative to chance variation.
Our analysis pipeline has also discovered the existence of SARS-CoV-2 hypermutation
with low frequency (less than in 2% of genomes) likely arising through host immune
responses and not due to sequencing errors. Among annotated non-sense variants
with a population frequency over 1%, recurrent inactivation of the ORF8 gene was
found. This was found to be present in the newly identified B.1.1.7 SARS-CoV-2 lineage
that originated in the United Kingdom. Almost all VOC-containing genomes possess
one stop codon in ORF8 gene (Q27 ), however, 13% of these genomes also contains
another stop codon (K68 ), suggesting that ORF8 loss does not interfere with SARSCoV-
2 spread and may play a role in its increased virulence. We have developed this
computational pipeline to assist researchers in the rapid analysis and characterization of
SARS-CoV-2 variation.
es_ES
Patrocinador
dc.description.sponsorship
Supercomputing infrastructure of the NLHPC ECM02
Research Manitoba
CancerCare MB Research Foundation
es_ES
Lenguage
dc.language.iso
en
es_ES
Publisher
dc.publisher
Frontiers Media
es_ES
Type of license
dc.rights
Attribution-NonCommercial-NoDerivs 3.0 United States
A novel SARS-CoV-2 viral sequence bioinformatic pipeline has found genetic evidence that the viral 3 ' untranslated region (UTR) is evolving and generating increased viral diversity