Deep-BSC: Predicting Raw DNA Binding Pattern in Arabidopsis Thaliana

Page: [457 - 465] Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Background: With the rapid development of the sequencing methods in recent years, binding sites have been systematically identified in such projects as Nested-MICA and MEME. Prediction of DNA motifs with higher accuracy and precision has been a very important task for bioinformaticians. Nevertheless, experimental approaches are still time-consuming for big data set, making computational identification of binding sites indispensable.

Objective: To facilitate the identification of the binding site, we proposed a deep learning architecture, named Deep-BSC (Deep-Learning Binary Search Classification), to predict binding sites in a raw DNA sequence with more precision and accuracy.

Methods: Our proposed architecture purely relies on the raw DNA sequence to predict the binding sites for protein by using a convolutional neural network (CNN). We trained our deep learning model on binding sites at the nucleotide level. DNA sequence of A. thaliana is used in this study because it is a model plant.

Results: The results demonstrate the effectiveness and efficiency of our method in the classification of binding sites against random sequences, using deep learning. We construct a CNN with different layers and filters to show the usefulness of max-pooling technique in the proposed method. To gain the interpretability of our approach, we further visualized binding sites in the saliency map and successfully identified similar motifs in the raw sequence. The proposed computational framework is time and resource efficient.

Conclusion: Deep-BSC enables the identification of binding sites in the DNA sequences via a highly accurate CNN. The proposed computational framework can also be applied to problems such as operator, repeats in the genome, DNA markers, and recognition sites for enzymes, thereby promoting the use of Deep-BSC method in life sciences.

Keywords: Transcription factors (TFs), DNA binding motifs, arabidopsis thaliana, convolutional neural network (CNN), computational biology, genomic.

Graphical Abstract

[1]
Nguyen NG, Tran VA, Ngo DL, et al. DNA Sequence Classification by Convolutional Neural Network. J Biomed Sci Eng 2016; 09(05): 280-6.
[http://dx.doi.org/10.4236/jbise.2016.95021]
[2]
Czibula G, Bocicor MI, Czibula IG. Promoter sequences prediction using relational association rule mining. Evol Bioinform Online 2012; 8(8): 181-96.
[http://dx.doi.org/10.4137/EBO.S9376] [PMID: 22563233]
[3]
Chowdhury N, Bagchi A. An overview of DNA-protein interactions. Curr Chem Biol 2015; 9(2): 73-83.
[http://dx.doi.org/10.2174/2212796809666151022202255]
[4]
Szabóová A, Kuželka O, Zelezný F, Tolar J. Prediction of DNA-binding proteins from relational features. Proteome Sci 2012; 10(1): 66.
[http://dx.doi.org/10.1186/1477-5956-10-66] [PMID: 23146001]
[5]
Kumar M, Gromiha MM, Raghava GP. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 2007; 8(1): 463.
[http://dx.doi.org/10.1186/1471-2105-8-463] [PMID: 18042272]
[6]
Das MK, Dai HK. A survey of DNA motif finding algorithms. BMC Bioinformatics 2007; 8(7): S21.
[http://dx.doi.org/10.1186/1471-2105-8-S7-S21] [PMID: 18047721]
[7]
Song L, Li D, Zeng X, Wu Y, Guo L, Zou Q. nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinformatics 2014; 15(1): 298.
[http://dx.doi.org/10.1186/1471-2105-15-298] [PMID: 25196432]
[8]
Hassanzadeh HR, Kolhe P, Isbell CL, Wang MD. MotifMark: Finding regulatory motifs in DNA sequences. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2017; pp. 3890-3..
[9]
Stormo GD. DNA binding sites: representation and discovery. Bioinformatics 2000; 16(1): 16-23.
[http://dx.doi.org/10.1093/bioinformatics/16.1.16]
[10]
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015; 16(6): 321-32.
[http://dx.doi.org/10.1038/nrg3920] [PMID: 25948244]
[11]
Holloway DT, Kon MA, Delisi C. Machine learning methods for transcription data integration. IBM J Res Develop 2006; 50(6): 631-43.
[http://dx.doi.org/10.1147/rd.506.0631]
[12]
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012; 1097-105.
[13]
Yue T, Wang H. Deep Learning for Genomics: A Concise Overview. arXiv 2018; 1-40..
[14]
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12(7): 878.
[http://dx.doi.org/10.15252/msb.20156651] [PMID: 27474269]
[15]
Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2017; 18(5): 851-69.
[PMID: 27473064]
[16]
Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet 2019; 51(1): 12-8.
[http://dx.doi.org/10.1038/s41588-018-0295-5] [PMID: 30478442]
[17]
Khodabandelou G, Routhier E, Mozziconacci J. Genome functional annotation using deep convolutional neural networks. bioRxiv 2018.330308
[18]
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell 2019; 176(3): 535-548.e24.
[http://dx.doi.org/10.1016/j.cell.2018.12.015] [PMID: 30661751]
[19]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv. 2013.
[20]
Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 2016; 32(12): i121-7.
[http://dx.doi.org/10.1093/bioinformatics/btw255] [PMID: 27307608]
[21]
van Helden J, André B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998; 281(5): 827-42.
[http://dx.doi.org/10.1006/jmbi.1998.1947] [PMID: 9719638]
[22]
van Helden J, Rios AF, Collado-Vides J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res 2000; 28(8): 1808-18.
[http://dx.doi.org/10.1093/nar/28.8.1808] [PMID: 10734201]
[23]
Hertz GZ, Hartzell GW III, Stormo GD. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 1990; 6(2): 81-92.
[http://dx.doi.org/10.1093/bioinformatics/6.2.81] [PMID: 2193692]
[24]
Down TA, Hubbard TJP. NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 2005; 33(5): 1445-53.
[http://dx.doi.org/10.1093/nar/gki282] [PMID: 15760844]
[25]
Liu D, Xiong X, DasGupta B, Zhang H. Motif discoveries in unaligned molecular sequences using self-organizing neural networks. IEEE Trans Neural Netw 2006; 17(4): 919-28.
[http://dx.doi.org/10.1109/TNN.2006.875987] [PMID: 16856655]
[26]
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2015; 12: 1-9.
[27]
Lanchantin J, Singh R, Lin Z, Qi Y. Deep motif: visualizing genomic sequence classifications. arXiv 2016; 1-5..
[28]
Berardini TZ, Reiser L, Li D, et al. The arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 2015; 53(8): 474-85.
[http://dx.doi.org/10.1002/dvg.22877] [PMID: 26201819]
[29]
Ong Q, Nguyen P, Thao NP, Le L. Bioinformatics approach in plant genomic research Curr Genomics 2016; 17(4): 368-78..
[http://dx.doi.org/10.2174/1389202917666160331202956] [PMID: 27499685]
[30]
Martinez M. Computational tools for genomic studies in plants. Curr Genomics 2016; 17(6): 509-14.
[http://dx.doi.org/10.2174/1389202917666160520103447] [PMID: 28217007]
[31]
Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text. arXiv 2015; 649-7..
[32]
Min X, Zeng W, Chen S, Chen N, Chen T, Jiang R. Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics 2017; 118(13): 478.
[http://dx.doi.org/10.1186/s12859-017-1878-3]
[33]
Yu N, Yu Z, Pan Y. A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 2017; 18(15): 511.
[http://dx.doi.org/10.1186/s12859-017-1922-3] [PMID: 29244011]
[34]
Lanchantin J, Singh R, Wang B, Qi Y. Deep motif dashboard: Visualizing and understanding genomic sequences using deep neural networks. Pac Symp Biocomput 2017; 22(212679): 254-65.
[http://dx.doi.org/10.1142/9789813207813_0025] [PMID: 27896980]
[35]
Kassim NA, Abdullah A. Classification of DNA sequences using convolutional neural network approach. UTM Comput Proc Innov Comput Technol Appl 2017; 2: 1-6.
[36]
Carneiro T, Da Nobrega RVM, Nepomuceno T, Bian GB, De Albuquerque VHC, Filho PPR. Performance analysis of google colaboratory as a tool for accelerating deep learning applications IEEE Access 2018; 6: 61677-85..
[http://dx.doi.org/10.1109/ACCESS.2018.2874767]
[37]
An J-Y, Zhou Y, Zhang L, Niu Q, Wang D-F. Improving self-interacting proteins prediction accuracy using protein evolutionary information and weighed-extreme learning machine. Curr Bioinform 2018; 14(2): 115-22.
[http://dx.doi.org/10.2174/1574893613666180209161152]
[38]
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: Visualising image classification models and saliency maps. 2nd Int Conf Learn Represent ICLR 2014 - Work Track Proc. 1-8.
[39]
Li J, Shou J, Guo Y, et al. Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9. J Mol Cell Biol 2015; 7(4): 284-98.
[http://dx.doi.org/10.1093/jmcb/mjv016] [PMID: 25757625]
[40]
Bailey TL, Boden M, Buske FA, et al. MEME SUITE: tools for motif discovery and searching . Nucleic Acids Res 2009; 37(Web Server issue): W202-8..
[http://dx.doi.org/10.1093/nar/gkp335] [PMID: 19458158]
[41]
Andken BB, Lim I, Benson G, et al. 3′-UTR SIRF: a database for identifying clusters of whort interspersed repeats in 3′ untranslated regions. BMC Bioinformatics 2007; 8(1): 274.
[http://dx.doi.org/10.1186/1471-2105-8-274] [PMID: 17663765]
[42]
Nain V, Sahi S, Ananda P. In Silico identification of regulatory elements in promoters. Comput Biol Appl Bioinform 2011; 2: 47-66.
[http://dx.doi.org/10.5772/22230]
[43]
Boutellier R, Heinzen M. Growth through innovation: managing the technology-driven enterprise. Springer International Publishing 2014.
[http://dx.doi.org/10.1007/978-3-319-04016-5]