Arrays in Practice: An Empirical Study of Array Access Patterns on the JVM

Beatrice Åkerblom,Elias Castegren
DOI: https://doi.org/10.22152/programming-journal.org/2024/8/14
2024-03-05
Abstract:The array is a data structure used in a wide range of programs. Its compact storage and constant time random access makes it highly efficient, but arbitrary indexing complicates the analysis of code containing array accesses. Such analyses are important for compiler optimisations such as bounds check elimination. The aim of this work is to gain a better understanding of how arrays are used in real-world programs. While previous work has applied static analyses to understand how arrays are accessed and used, we take a dynamic approach. We empirically examine various characteristics of array usage by instrumenting programs to log all array accesses, allowing for analysis of array sizes, element types, from where arrays are accessed and to which extent sequences of array accesses form recognizable patterns. The programs in the study were collected from the Renaissance benchmark suite, all running on the Java Virtual Machine.
Programming Languages
What problem does this paper attempt to address?
The paper primarily explores the usage of arrays and their access patterns in practical applications, and proposes a method to analyze array access characteristics in programs running on the Java Virtual Machine (JVM). The goal of the study is to better understand the actual usage of arrays in real-world programs. Specifically, the paper addresses the following key issues: 1. **Characteristics of Arrays**: The study examines the characteristics of arrays that are created and used, including array size, the type of data stored, and the portions of the array that are actually accessed. 2. **Sources of Array Access**: Data was collected on where arrays are accessed from, including the classes that access the arrays and the identity of the threads performing the access. 3. **Whether Arrays are Accessed in a Regular Manner**: By analyzing the distribution of all accesses to a single array, the study identifies existing patterns, compares the access distributions between different arrays, identifies arrays with the same or unique access patterns, and measures the proportion of arrays traversed in a regular manner. To achieve these goals, the researchers adopted a dynamic approach rather than static analysis. This method allows for precise tracking of all array accesses without any approximation or abstraction. They instrumented programs in the Renaissance benchmark suite, recorded all array access behaviors, and then analyzed this data to extract information about array usage. The study's findings indicate that: - Most arrays are small in size and are accessed by only one or two classes, typically by a single thread. - On average, more than 69.8% of array access patterns consist of simple traversals. - Over 53.8% of all array accesses occur in identifiable simple sequential traversals or constant sequences. This research not only provides insights into the usage of arrays in actual programs but also lays the foundation for future runtime implementations and compiler optimizations.