Annotating Diverse Scientific Data with HAScO.

Paulo Pinheiro,Marcello Peixoto Bax,Henrique Santos,Sabbir M. Rashid,Zhicheng Liang,Yue Liu,Yarden Ne'eman,Jamie P. McCusker,Deborah L. McGuinness
2018-01-01
Abstract:Ontologies are being widely used across many scientific fields, most notably in roles related to acquiring, preparing, integrating and managing data resources. Data acquisition and preparation activities are often difficult to reuse since they tend to be domain dependent, as well as dependent on how data is acquired: through measurement, subject-elicitation, and/or model-generation activities. Therefore, tools developed for preparing data from one scientific activity often cannot be easily adapted to prepare data from other scientific activities. We introduce the Human-Aware Science Ontology (HAScO) that integrates a collection of well-established science-related ontologies, and aims to address issues related to data annotation for large data ecosystem, where data can come from diverse data sources including sensors, lab results, and questionnaires. The work reported in the paper is based on our experience developing HAScO, using it to annotate data collections to facilitate data exploration and analysis for numerous scientific projects, three of which will be described. Data files produced by scientific studies are processed to identify and annotate the objects (a gene, for instance) with the appropriate ontological terms. One benefit we realized (of preserving scientific data provenance) is that software platforms can support scientists in their exploration and preparation of data for analysis since the meaning of and interrelationships between the data is explicit.
What problem does this paper attempt to address?