The A4 project: physics data processing using the Google protocol buffer library

Johannes Ebke,Peter Waller
DOI: https://doi.org/10.1088/1742-6596/396/2/022012
2012-08-08
Abstract:In this paper, we present the High Energy Physics data format, processing toolset and analysis library a4, providing fast I/O of structured data using the Google protocol buffer library. The overall goal of a4 is to provide physicists with tools to work efficiently with billions of events, providing not only high speeds, but also automatic metadata handling, a set of UNIX-like tools to operate on a4 files, and powerful and fast histogramming capabilities. At present, a4 is an experimental project, but it has already been used by the authors in preparing physics publications. We give an overview of the individual modules of a4, provide examples of use, and supply a set of basic benchmarks. We compare a4 read performance with the common practice of storing unstructured data in ROOT trees. For the common case of storing a variable number of floating-point numbers per event, speedups in read speed of up to a factor of six are observed.
Data Analysis, Statistics and Probability,Computational Physics
What problem does this paper attempt to address?