Abstract:This study aims to collect GPU rendering programs and analyze their characteristics to construct a benchmark dataset that reflects the characteristics of GPU rendering programs, providing a reference basis for designing the next generation of graphics processors. The research framework includes four parts: GPU rendering program integration, data collection, program analysis, and similarity analysis. In the program integration and data collection phase, 1000 GPU rendering programs were collected from open-source repositories, and 100 representative programs were selected as the initial benchmark dataset. The program analysis phase involves instruction-level, thread-level, and memory-level analysis, as well as five machine learning algorithms for importance ranking. Finally, through Pearson similarity analysis, rendering programs with high similarity were eliminated, and the final GPU rendering program benchmark dataset was selected based on the benchmark's comprehensiveness and representativeness. The experimental results of this study show that, due to the need to load and process texture and geometry data in rendering programs, the average global memory access efficiency is generally lower compared to the averages of the Rodinia and Parboil benchmarks. The GPU occupancy rate is related to the computationally intensive tasks of rendering programs. The efficiency of stream processor execution and thread bundle execution is influenced by branch statements and conditional judgments. Common operations such as lighting calculations and texture sampling in rendering programs require branch judgments, which reduce the execution efficiency. Bandwidth utilization is improved because rendering programs reduce frequent memory access and data transfer to the main memory through data caching and reuse. Furthermore, this study used multiple machine learning methods to rank the importance of 160 characteristics of 100 rendering programs on four different NVIDIA GPUs. Different methods demonstrate robustness and stability when facing different data distributions and characteristic relationships. By comparing the results of multiple methods, biases inherent to individual methods can be reduced, thus enhancing the reliability of the results. The contribution of this study lies in the analysis of workload characteristics of rendering programs, enabling targeted performance optimization to improve the efficiency and quality of rendering programs. By comprehensively collecting GPU rendering program data and performing characteristic analysis and importance ranking using machine learning methods, reliable reference guidelines are provided for GPU design. This is of significant importance in driving the development of rendering technology.

Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking

Dissecting GPU Memory Hierarchy Through Microbenchmarking

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

A quantitative performance analysis model for GPU architectures

Verified instruction-level energy consumption measurement for NVIDIA GPUs

Modeling Deep Learning Accelerator Enabled GPUs

Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis

A Performance Analysis Framework for Exploiting GPU Microarchitectural Capability.

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs

Part-time Power Measurements: nvidia-smi's Lack of Attention

The anachronism of whole-GPU accounting

Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Characterizing the Execution Dynamics of GPGPU Applications

Analyzing CUDA workloads using a detailed GPU simulator

RayBench: An Advanced NVIDIA-Centric GPU Rendering Benchmark Suite for Optimal Performance Analysis

Violet: Architecturally Exposed Orchestration, Movement, and Placement for Generalized Deep Learning

Fine-Grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

COOK Access Control on an embedded Volta GPU