Current Bioinformatics

Author(s): Emad S. Hassan*, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary and Fathi E. Abd El‑Samie

DOI: 10.2174/0115748936287244240117065325

DownloadDownload PDF Flyer Cite As
Improved Hybrid Approach for Enhancing Protein-Coding Regions Identification in DNA Sequences

Page: [208 - 228] Pages: 21

  • * (Excluding Mailing and Handling)

Abstract

Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences by employing a hybrid methodology that combines digital bandpass filtering with wavelet transform and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences.

Methods: This research work showcases the utility of Haar and Daubechies wavelet transforms, both non-parametric and parametric spectral estimation techniques, and the deployment of a digital bandpass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of Sum-of-Sinusoids (SoS) mathematical model with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of the proposed approach in modeling DNA sequences, optimally, and accurately identifying genes.

Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak localization by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy, when compared with existing ones.

Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques.

Keywords: Bioinformatics, protein coding regions, digital signal processing, wavelet transforms, sequence analysis, spectral estimation.