Mini-Reviews in Organic Chemistry

Author(s): Zhao-Hui Qi, Meng-Zhe Jin, Jia-Shuo Wang and Su-Li Li

DOI: 10.2174/1570193X13666151218191633

DownloadDownload PDF Flyer Cite As
Novel DNA Sequence Comparison Method Based on Markov Chain and Information Entropy

Page: [524 - 533] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

The comparison of DNA sequences is the basic topic in computational biology and bioinformatics, helping in speculation about their previously ambiguous structure, function, and evolution relationship. In this article, we provide a novel DNA sequence comparison scheme by constructing feature vectors based on Markov chain and information entropy. A new measure, which is calculated as the entropy of K-string’s four one-step transition probabilities, is used to compose the feature vector to characterize DNA sequence. At the same time, we provide a novel concept to address the computation burden caused by the exponential growth of computation complexity when K grows in a traditional K-string model, which is named K-string list. The proposed scheme allows us to conduct similarity research and phylogenetic analysis on two real datasets, the first exon of 11 species’

Keywords: DNA sequence comparison, entropy, feature vector, K-string list, markov model, phylogenetic analysis.

Graphical Abstract