Human Proteome Organization Proteomics Standards Initiative: data standardization, a view on developments and policy.

L. Martens,S. Orchard,R. Apweiler,H. Hermjakob
IF: 7.381
2007-09-01
Molecular & Cellular Proteomics
Abstract:The MCP guidelines on the publication of the underlying data supporting proteomics papers (1) spearheaded a movement toward more structured sharing of (high quality) data. The original MCP guidelines discuss the criteria by which quality is judged. However, this is only one aspect of the process of publishing proteomics data. Upstream of the quality assignment are the use of standardized data formats and the fulfilling of minimal reporting requirements, such as those created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) (2), first led by Rolf Apweiler and currently chaired by Henning Hermjakob with Rudi Aebersold as co-chair. It is interesting to see how these three aspects interact: the standardized data formats allow the uniform and effortless reading of data generated in different laboratories, whereas the minimal reporting requirements ensure that sufficient information is made available to perform the quality assignment according to a defined set of criteria. Obviously the undertaking of data sharing does not end with standards, reporting guidelines, or quality assignment because the data must ultimately be made publicly available. Availability, however, should be partnered with accessibility, implying a limited number of locations that are well stocked with data and offering powerful query abilities. These specific requirements are best satisfied by centralized data repositories such as the Global Proteome Machine Database (GPMDB) (3), Proteomics Identifications Database (PRIDE) (4), and PeptideAtlas (5). Development of these aspects (standardized data exchange formats, minimal reporting requirements, quality criteria, and data repositories) of data sharing is taking place in parallel, driven by the different parties involved: the journals, producers of standards, repository developers, and data providers. The European Union-funded Proteomics Data Collection (ProDaC) grant is providing a unique opportunity to synchronize and tie together these parallel efforts by funding a comprehensive project, which simultaneously supports the creation of standard data formats, the adaptation of repositories, and the implementation of standards-compliant pipelines for data submissions into the repositories from a wide array of laboratories worldwide. Most importantly, however, the ProDaC project also lists as deliverable the established reuse and valorization of the assembled data in repositories, for example the annotation of protein sequence databases such as UniProtKB/Swiss-Prot, a process where quality control also comes into play. What stands out in the above is that standards serve as essential tools to enable both data quality assessment and its subsequent reuse. Additionally because standardization is a continuously moving target, in a fast evolving field like proteomics a consistent review and revision of these standards are necessary. At the same time, standards should evolve in well defined and broadly spaced steps. The latter is essential to elicit broad implementation of standards; bluntly put, nobody will invest in writing software for a standard that will change in 3 months time. Having established that development of standards is an essential part of the aim to provide high quality and well annotated data for reuse by the community, be that passive (via annotations that appear in sequence databases) or active (by downloading and reanalyzing data), it is clear that maintaining these efforts as a purely voluntary enterprise presents a suboptimal situation. The example set by the European Union’s funding of the ProDaC grant could be extended to supply at least some targeted funding for these standards. Interestingly because standards are intrinsically meant to be universal, the (partial) funding of development and maintenance of standards is a clear target for prototype collaborations in the field of proteomics between leading funding agencies across the globe.
What problem does this paper attempt to address?