Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of the human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background: Improving health standards of life is one of the motives, which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of the human cellular system. Inferring new knowledge from known facts always requires a reasonably large amount of data in well-structured, integrated, and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at an astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop an aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Methods: We propose an aspect-oriented formal data integration model that uses web semantics standards to formally specify its every construct. The proposed model supports the aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with each other in a physical cell system. Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntaxKeywords: Formal specifications, semantic data schema, omics integration model, web data semantics, data heterogeneity, data warehouse, omics annotations, multi-layered architecture.