Background: RNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second- generation sequencing detection is constrained by its short read.
Objective: We aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest.
Methods: We developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing.
Results: Data generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing.
Conclusion: We report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.