Chimera: a Virtual Data System for Representing, Querying, and Automating Data Derivation

I Foster,J Vockler,M Wilde,Y Zhao
DOI: https://doi.org/10.1109/ssdm.2002.1029704
2002-01-01
Abstract:Much scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog, for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "Data Grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision eventdata from a high-energy physics experiment, and the search of digital sky survey data for galactic clusters, with promising results.
What problem does this paper attempt to address?