Integration and Querying of Heterogeneous Omics Semantic Annotations for Biomedical and Biomolecular Knowledge Discovery

Page: [41 - 58] Pages: 18

  • * (Excluding Mailing and Handling)

Abstract

Background: Exploring various functional aspects of a biological cell system has been a focused research trend for last many decades. Biologists, scientists and researchers are continuously striving for unveiling the mysteries of these functional aspects to improve the health standards of life. For getting such understanding, astronomically growing, heterogeneous and geographically dispersed omics data needs to be critically analyzed. Currently, omics data is available in different types and formats through various data access interfaces. Applications which require offline and integrated data encounter a lot of data heterogeneity and global dispersion issues.

Objective: For facilitating especially such applications, heterogeneous data must be collected, integrated and warehoused in such a loosely coupled way so that each molecular entity can computationally be understood independently or in association with other entities within or across the various cellular aspects.

Methods: In this paper, we propose an omics data integration schema and its corresponding data warehouse system for integrating, warehousing and presenting heterogeneous and geographically dispersed omics entities according to the cellular functional aspects.

Results & Conclusion: Such aspect-oriented data integration, warehousing and data access interfacing through graphical search, web services and application programing interfaces make our proposed integrated data schema and warehouse system better and useful than other contemporary ones.

Keywords: Heterogeneity, warehouse, data integration, omics layer, molecular interaction, multitier architecture, data schema, omics annotations.

Graphical Abstract

[1]
Joyce AR, Palsson BO. The model organism as a system: integrat-ing ‘omics’ data sets. Nat Rev Mol Cell Biol 2006; 7(3): 198-210.
[http://dx.doi.org/10.1038/nrm1857] [PMID: 16496022]
[2]
Gomez-Cabrero D, Abugessaisa I, Maier D, et al. Data integration in the era of omics: current and future challenges. BMC Syst Biol 2014; 8(Suppl. 2): I1.
[http://dx.doi.org/10.1186/1752-0509-8-S2-I1] [PMID: 25032990]
[3]
Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Ap-weiler R. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res 2016; 44(D1): D20-6.
[http://dx.doi.org/10.1093/nar/gkv1352] [PMID: 26673705]
[4]
Rigden DJ, Fernández XM. The 2018 Nucleic Acids Research database issue and the online molecular biology database collec-tion. Nucleic Acids Res 2018; 46(D1): D1-7.
[http://dx.doi.org/10.1093/nar/gkx1235] [PMID: 29316735]
[5]
Galperin MY, Fernández-Suárez XM, Rigden DJ. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes. Nucleic Acids Res 2017; 45: 1-11.
[http://dx.doi.org/10.1093/nar/gkw1188]
[6]
Benson DA, Cavanaugh M, Clark K, et al. GenBank. Nucleic Acids Res 2018; 46(D1): D41-7.
[http://dx.doi.org/10.1093/nar/gkx1094] [PMID: 29140468]
[7]
Mashima J, Kodama Y, Fujisawa T, et al. DNA Data Bank of Japan. Nucleic Acids Res 2017; 45(D1): D25-31.
[http://dx.doi.org/10.1093/nar/gkw1001] [PMID: 27924010]
[8]
Toribio AL, Alako B, Amid C, et al. European Nucleotide Archive in 2016. Nucleic Acids Res 2017; 45(D1): D32-6.
[http://dx.doi.org/10.1093/nar/gkw1106] [PMID: 27899630]
[9]
Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform 2008; 41(5): 687-93.
[http://dx.doi.org/10.1016/j.jbi.2008.01.008] [PMID: 18358788]
[10]
Etzold T, Ulyanov A, Argos P. SRS: information retrieval system for molecular biology data banks. Methods Enzymol 1996; 266: 114-28.
[http://dx.doi.org/10.1016/S0076-6879(96)66010-8] [PMID: 8743681]
[11]
Kersey P, Bower L, Morris L, et al. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 2005; 33(Database issue): D297-302.
[http://dx.doi.org/10.1093/nar/gki039] [PMID: 15608201]
[12]
Ostell J. The Entrez Search and Retrieval System 2nd ed. Bethesda: National Center for Biotechnology Information 2014.
[13]
Stevens R, Baker P, Bechhofer S, et al. TAMBIS: transparent access to multiple bioinformatics information sources. Bioinformatics 2000; 16(2): 184-5.
[http://dx.doi.org/10.1093/bioinformatics/16.2.184] [PMID: 10842744]
[14]
Smedley D, Haider S, Ballester B, et al. BioMart--biological que-ries made easy. BMC Genomics 2009; 10: 22.
[http://dx.doi.org/10.1186/1471-2164-10-22] [PMID: 19144180]
[15]
Davidson SB, Overton C, Tanen V, Wong L. BioKleisli: A digital library for biomedical researchers. Int J Digit Libr 1997; 1: 36-53.
[http://dx.doi.org/10.1007/s007990050003]
[16]
Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC. DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst J 2001; 40: 489-511.
[http://dx.doi.org/10.1147/sj.402.0489]
[17]
Davidson SB, Crabtree J, Brunk BP, et al. K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Syst J 2001; 40: 512-30.
[http://dx.doi.org/10.1147/sj.402.0512]
[18]
Freier A, Hofestädt R, Lange M, Scholz U, Stephanik A. BioDataServer: a SQL-based service for the online integration of life science data. In silico Biol (Gedrukt) 2002; 2(2): 37-57.
[PMID: 12066840]
[19]
Miled ZB, Li N, Kellett GM, Sipes B, Bukhres O. Complex life science multidatabase queries. Proceedings of the IEEE. 1754-63.
[20]
Cadag E, Louie B, Myler PJ, Tarczy-Hornoch P. Biomediator data integration and inference for functional annotation of anonymous sequences Pac Symp Biocomput 2007; 12: 343-54.
[PMID: 17990504]
[21]
Smith RN, Aleksic J, Butano D, et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 2012; 28(23): 3163-5.
[http://dx.doi.org/10.1093/bioinformatics/bts577] [PMID: 23023984]
[22]
Stein L. Creating a bioinformatics nation. Nature 2002; 417(6885): 119-20.
[http://dx.doi.org/10.1038/417119a] [PMID: 12000935]
[23]
Lee TJ, Pouliot Y, Wagner V, et al. BioWarehouse: a bioinformat-ics database warehouse toolkit. BMC Bioinformatics 2006; 7: 170.
[http://dx.doi.org/10.1186/1471-2105-7-170] [PMID: 16556315]
[24]
Hedeler C, Wong HM, Cornell MJ, et al. e-Fungi: a data resource for comparative analysis of fungal genomes. BMC Genomics 2007; 8: 426.
[http://dx.doi.org/10.1186/1471-2164-8-426] [PMID: 18028535]
[25]
Cornell M, Paton NW, Wu S, et al. GIMS-a data warehouse for storage and analysis of genome sequence and functional data. Proceedings of the 2nd IEEE international symposium on bioinformatics and bioengineering. 2001 Nov 4-6; Bethesda, MD, USA.
[http://dx.doi.org/10.1109/BIBE.2001.974407]
[26]
Birkland A, Yona G. BIOZON: a system for unification, manage-ment and analysis of heterogeneous biological data. BMC Bioinformatics 2006; 7: 70.
[http://dx.doi.org/10.1186/1471-2105-7-70] [PMID: 16480510]
[27]
Trissl S, Rother K, Müller H, et al. Columba: an integrated data-base of proteins, structures, and annotations. BMC Bioinformatics 2005; 6: 81.
[http://dx.doi.org/10.1186/1471-2105-6-81] [PMID: 15801979]
[28]
Shah SP, Huang Y, Xu T, Yuen MM, Ling J, Ouellette BFF. Atlas - a data warehouse for integrative bioinformatics. BMC Bioinformatics 2005; 6: 34.
[http://dx.doi.org/10.1186/1471-2105-6-34] [PMID: 15723693]
[29]
Blankenberg D, Coraor N, Von Kuster G, Taylor J, Nekrutenko A. Galaxy Team Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database (Oxford) 2011; 2011bar011.
[http://dx.doi.org/10.1093/database/bar011] [PMID: 21531983]
[30]
Wolstencroft K, Haines R, Fellows D, et al. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud Nucleic Acids Res 2013; 41(Web Server issue)W557-61.
[http://dx.doi.org/10.1093/nar/gkt328] [PMID: 23640334]
[31]
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics 2001; 2: 7.
[http://dx.doi.org/10.1186/1471-2105-2-7] [PMID: 11667947]
[32]
Wilkinson M, Schoof H, Ernst R, Haase D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol 2005; 138(1): 5-17.
[http://dx.doi.org/10.1104/pp.104.059170] [PMID: 15888673]
[33]
Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform 2002; 3(4): 331-41.
[http://dx.doi.org/10.1093/bib/3.4.331] [PMID: 12511062]
[34]
Cheung KH, Yip KY, Smith A, Deknikker R, Masiar A, Gerstein M. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 2005; 21(Suppl. 1): i85-96.
[http://dx.doi.org/10.1093/bioinformatics/bti1026] [PMID: 15961502]
[35]
Neumann EK, Quan D. Biodash: a semantic web dashboard for drug development. Pacific Symposium on Biocomputing. 176-87.
[36]
Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008; 41(5): 706-16.
[http://dx.doi.org/10.1016/j.jbi.2008.03.004] [PMID: 18472304]
[37]
Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. J Biol Res (Thessalon) 2015; 22(1): 9.
[http://dx.doi.org/10.1186/s40709-015-0032-5] [PMID: 26336651]
[38]
Gligorijević V, Pržulj N. Methods for biological data integration: perspectives and challenges. J R Soc Interface 2015; 12(112) 20150571
[http://dx.doi.org/10.1098/rsif.2015.0571] [PMID: 26490630]
[39]
Masseroli M, Canakoglu A, Ceri S. Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction IEEE/ACM Transaction on Computational Biology and Bioinformatics 2016; 13: 209-19.
[http://dx.doi.org/10.1109/TCBB.2015.2453944]
[40]
Cheung KH, Kashyap V, Luciano JS, Chen H, Wang Y, Stephens S. Semantic mashup of biomedical data. J Biomed Inform 2008; 41(5): 683-6.
[http://dx.doi.org/10.1016/j.jbi.2008.08.003] [PMID: 18703163]
[41]
Callahan A, Cruz-Toledo J, Dumontier M. Ontology-based query-ing with Bio2RDF’s linked open data. J Biomed Semantics 2013; 4(Suppl. 1): S1.
[http://dx.doi.org/10.1186/2041-1480-4-S1-S1] [PMID: 23735196]
[42]
XML and Semantic Web W3C Standards Timeline.. dblab.ntua.gr/~bikakis/XMLSemanticWebW3CTimeline.pdf
[43]
Triplet T, Butler G. A review of genomic data warehousing sys-tems. Brief Bioinform 2014; 15(4): 471-83.
[http://dx.doi.org/10.1093/bib/bbt031] [PMID: 23673292]
[44]
Schatz MC. Biological data sciences in genome research. Genome Res 2015; 25(10): 1417-22.
[http://dx.doi.org/10.1101/gr.191684.115] [PMID: 26430150]
[45]
Wilson G, Aruliah DA, Brown CT, et al. Best practices for scien-tific computing. PLoS Biol 2014; 12(1) e1001745
[http://dx.doi.org/10.1371/journal.pbio.1001745] [PMID: 24415924]
[46]
Masouleh MF, Kazemi MA, Alborzi M, Eshlaghy AT. Optimiza-tion of ETL process in data warehouse through a combination of parallelization and shared cache memory. engineering. Technology and Applied Science Research 2016; 6: 1241-4.
[47]
Simitsis A, Vassiliadis P, Sellis T. Optimizing ETL Processes in Data Warehouses. Proceedings of the 21st International Conference on Data Engineering. 2005 April 5-8; Tokoyo, Japan. 564-75.
[48]
Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 2015; 43(Database issue): D789-98.
[http://dx.doi.org/10.1093/nar/gku1205] [PMID: 25428349]
[49]
Masseroli M, Galati O, Pinciroli F. GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists Nucleic Acids Res 2005; 33(Web Server issue): W717-23.
[http://dx.doi.org/10.1093/nar/gki454] [PMID: 15980570]
[50]
Pastor O, Casamayor JC, Celma M, Mota L, Pastor MA, Levin AM. Conceptual Modeling of Human genome: integration challenges. In: Düsterhöft A, Klettke M, Schewe KD, Eds. Conceptual Modelling and Its Theoretical Foundations. Heidelberg: Springer- Verlag 2012; 7260: pp. 231-50
[http://dx.doi.org/10.1007/978-3-642-28279-9_17]
[51]
Bornberg-Bauer E, Paton NW. Conceptual data modelling for bioinformatics. Brief Bioinform 2002; 3(2): 166-80.
[http://dx.doi.org/10.1093/bib/3.2.166] [PMID: 12139436]
[52]
Chromiak M, Grabowiecki M. Heterogeneous Data Integration Architecture-Challenging Integration Issues. Informatica 2015; 15: 7-11.
[53]
Louie B, Mork P, Martin-Sanchez F, Halevy A, Tarczy-Hornoch P. Data integration and genomic medicine. J Biomed Inform 2007; 40(1): 5-16.
[http://dx.doi.org/10.1016/j.jbi.2006.02.007] [PMID: 16574494 ]
[54]
Gilbert-Diamond D, Moore JH. Analysis of Gene-Gene Interac-tions. In: Current protocols in human genetics. new jersey: Wiley 2011; 7: pp. 1.14.11-11.14.12.
[http://dx.doi.org/10.1002/0471142905.hg0114s70]
[55]
Ali A, Bagchi A. An Overview of Protein-Protein Interaction. Curr Chem Biol 2015; 9: 53-65.
[http://dx.doi.org/10.2174/221279680901151109161126]
[56]
Lai D, Meyer IM. A comprehensive comparison of general RNA-RNA interaction prediction methods. Nucleic Acids Res 2016; 44(7) e61
[http://dx.doi.org/10.1093/nar/gkv1477] [PMID: 26673718]
[57]
Dey B, Thukral S, Krishnan S, et al. DNA-protein interactions: methods for detection and analysis. Mol Cell Biochem 2012; 365(1-2): 279-99.
[http://dx.doi.org/10.1007/s11010-012-1269-z] [PMID: 22399265]
[58]
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017; 45(D1): D353-61.
[http://dx.doi.org/10.1093/nar/gkw1092] [PMID: 27899662]
[59]
Kersey PJ, Allen JE, Allot A, et al. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Res 2018; 46(D1): D802-8.
[http://dx.doi.org/10.1093/nar/gkx1011] [PMID: 29092050]
[60]
Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA. Genenames.org: the HGNC and VGNC resources in 2017. Nucleic Acids Res 2017; 45(D1): D619-25.
[http://dx.doi.org/10.1093/nar/gkw1033] [PMID: 27799471]