A generic concept for large-scale microarray analysis dedicated to medical diagnostics

M Dugas,F Weninger,S Merk,A Kohlmann,T Haferlach
Abstract:Background: The development of diagnostic procedures based on microarray analysis confronts the bioinformatician and the biomedical researcher with a variety of challenges. Microarrays generate a huge amount of data. There are many, not yet clearly defined, data processing steps and many clinical response variables which may not match gene expression patterns. Objectives: To design a generic concept for largescale microarray experiments dedicated to medical diagnostics; to create a system capable of handling several 1000 microarrays per analysis and more than 100 clinical response variables; to design a standardized workflow for quality control, data calibration, identification of differentially expressed genes and estimation of classification accuracy; and to provide a user-friendly interface for clinical researchers with respect to biomedical interpretation. Methods: We designed a database structure suitable for the storage of microarray data and analysis results. We applied statistical procedures to identify differential genes and developed a technique to estimate classification accuracy of gene patterns with confidence intervals. Results: We implemented a Gene Analysis Management System (GAMS) based on this concept, using MySQL for data storage, R/Bioconductor for analysis and PHP for a web-based front-end for the exploration of microarray data and analysis results. This system was utilized with large data sets from several medical disciplines, mainly from oncology (approximately 2000 microarrays). Conclusions: A systematic approach is necessary for the analysis of microarray experiments in a medical diagnostics setting to get comprehensible results. Due to the complexity of the analysis, data processing (by bioinformaticians) and interactive exploration of results (by biomedical experts) should be separated.
What problem does this paper attempt to address?