JXP4BIGI: a Generalized, Java XML-based Approach for Biological Information Gathering and Integration.
YH Huang,TY Ni,L Zhou,S Su
DOI: https://doi.org/10.1093/bioinformatics/btg327
IF: 5.8
2003-01-01
Bioinformatics
Abstract:MOTIVATION:In the post-genomic era, biologists interested in systems biology often need to import data from public databases and construct their own system-specific or subject-oriented databases to support their complex analysis and knowledge discovery. To facilitate the analysis and data processing, customized and centralized databases are often created by extracting and integrating heterogeneous data retrieved from public databases. A generalized methodology for accessing, extracting, transforming and integrating the heterogeneous data is needed.RESULTS:This paper presents a new data integration approach named JXP4BIGI (Java XML Page for Biological Information Gathering and Integration). The approach provides a system-independent framework, which generalizes and streamlines the steps of accessing, extracting, transforming and integrating the data retrieved from heterogeneous data sources to build a customized data warehouse. It allows the data integrator of a biological database to define the desired bio-entities in XML templates (or Java XML pages), and use embedded extended SQL statements to extract structured, semi-structured and unstructured data from public databases. By running the templates in the JXP4BIGI framework and using a number of generalized wrappers, the required data from public databases can be efficiently extracted and integrated to construct the bio-entities in the XML format without having to hard-code the extraction logics for different data sources. The constructed XML bio-entities can then be imported into either a relational database system or a native XML database system to build a biological data warehouse.AVAILABILITY:JXP4BIGI has been integrated and tested in conjunction with the IKBAR system (http://www.ikbar.org/) in two integration efforts to collect and integrate data for about 200 human genes related to cell death from HUGO, Ensembl, and SWISS-PROT (Bairoch and Apweiler, 2000), and about 700 Drosophila genes from FlyBase (FlyBase Consortium, 2002). The integrated data has been used in comparative genomic analysis of x-ray induced cell death. Also, as explained later, JXP4BIGI is a middleware and framework to be integrated with biological database applications, and cannot run as a stand-alone software for end users. For demonstration purposes, a demonstration version is accessible at (http://www.ikbar.org/jxp4bigi/demo.html).