A Joint Probabilistic Model in DNA Sequences

Page: [234 - 240] Pages: 7

  • * (Excluding Mailing and Handling)

Abstract

Background: Most existing methods for comparing and analyzing DNA sequences use multiple sequence alignment (MSA) algorithms. However, the computation time required for MSA is usually very long and makes it impossible to analyze a large group of long DNA sequences.

Objective: Here we propose a novel computational method to quickly characterize and compare DNA sequences.

Method: We construct a new 2-dimensional (2D) graphical representation of DNA sequences based on the mathematical concept of joint probability. A dinucleotide is assigned by the product of the signed probability of the two nucleotides, which is totally independent of the choice of the species studied.

Results: We perform similarity/dissimilarity analyses among three real DNA data sets, the first exon of the beta-globin gene of eleven animal species, ribulose bisphosphate carboxylase small chain (rbcS) gene of eleven species of flowering plants, and mitochondrial genome sequences of eleven mammal species, respectively.

Conclusion: Our results coincide with existing biological analyses in the literature. We also compare our approach with MSA algorithm, which is much quicker and more effective.

Keywords: Graphical representation, dinucleotide, numerical characterization, similarity/dissimilarity, phylogenetic tree, flowering plants.

Graphical Abstract