CompF5: End User Analysis Topical Group Report
Gavin S. Davies,Peter Onyisi,Amy Roberts
DOI: https://doi.org/10.48550/arXiv.2209.14984
2022-09-30
Abstract:This report summarizes the work of the Computational Frontier topical group on end user analysis for Snowmass 2021. End User Analysis refers to the extraction of physics results from reconstructed and simulated experimental data. High energy physics experiments produce systems that perform common reconstruction, calibration, and simulation tasks, resulting in shared data samples. These detailed data samples are then reduced to create a range of analysis samples, often optimized for a particular physics topic by a trigger or data object selection. End users (analyzers) then analyze those samples to produce physics results. Community discussions converged on categorizing the end user analysis ensemble into Analysis Ecosystems, Analysis Models, Dataset Bookkeeping and Formats, Collaborative Software, and Training. We present findings and recommendations based on these key areas that impact end user analysis.
Computational Physics,High Energy Physics - Experiment,High Energy Physics - Lattice,High Energy Physics - Phenomenology,High Energy Physics - Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges and obstacles faced by End User Analysis in high - energy physics experiments. Specifically, the paper focuses on the process of extracting physical results from reconstructed and simulated experimental data, which is crucial for high - energy physics experiments. However, the generation of these physical results becomes more difficult due to hidden information, lack of documentation, and dependence on expert intervention or training. These problems particularly affect scientists with limited software skills, those who are reluctant to take up others' time, or those who are ranked low on the expert priority list.
Supported by community discussions, letters of interest, white papers and other public materials, the paper identifies the obstacles in end - user analysis and proposes potential solutions. These obstacles include but are not limited to:
1. **Ecosystem Dependence**: End - user analysis depends on complex software ecosystems, including libraries, languages and data formats, etc. Incompatibility between these systems or lack of maintenance support will increase the difficulty of analysis.
2. **I/O - Intensive Nature of Data Processing**: Data analysis usually requires a large number of input / output operations, which will be repeatedly executed many times during the experimental life cycle, especially during the process of continuous optimization of selection and algorithms.
3. **Limitations of Computational Resources**: Although dedicated facilities can perform parallel analysis of large - scale samples, in many cases, end - user analysis is completed on local computing clusters or even laptops, which limits the efficiency and scale of analysis.
4. **Diversity of Analysis Activities**: Analysis activities include multiple tasks such as calibration, feature detection, limit setting, parameter estimation and cross - section measurement, and each task may require different tools and technical support.
The paper aims to promote discussions within the community by identifying these problems, thereby promoting the development of solutions to improve the efficiency and accessibility of end - user analysis.