COOK Access Control on an embedded Volta GPU

Benjamin Lesage,Frédéric Boniol,Claire Pagetti
2024-06-20
Abstract:The last decade has seen the emergence of a new generation of multi-core in response to advances in machine learning, and in particular Deep Neural Network (DNN) training and inference tasks. These platforms, like the JETSON AGX XAVIER, embed several cores and accelerators in a SWaP- efficient (Size Weight and Power) package with a limited set of resources. However, concurrent applications tend to interfere on shared resources, resulting in high execution time variability for applications compared to their behaviour in isolation.Access control techniques aim to selectively restrict the flow of operations executed by a resource. To reduce the impact of interference on the JETSON Volta GPU, we specify and implement an access control technique to ensure each GPU operation executes in isolation to reduce its timing variability. We implement the controller using three different strategies and assess their complexity and impact on the application performance. Our evaluation shows the benefits of adding the access control: its transparency to applications, reduced timing variability, isolation between GPU operations, and small code complexity. However, the strategies may cause some potential slowdowns for applications even in isolation but which are reasonable.
Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the issue of execution time variation caused by resource sharing among multiple concurrent applications on an embedded Volta GPU (such as the NVIDIA Jetson AGX Xavier platform). Specifically: 1. **Resource Competition and Interference**: On multi - core processor and accelerator platforms, multiple applications will compete for shared resources, which can lead to significant variations in the execution time of applications, especially when they run in parallel. This interference poses a threat to the predictability and security of the system, especially in application scenarios that require strict time guarantees, such as the avionics field. 2. **Reducing Uncertainty in Execution Time**: In order to ensure that each GPU operation can be executed in an isolated state to reduce its time variability, the author proposes an access control technique. This technique aims to mitigate the impact of interference by restricting the operation flow, thereby improving the time predictability of the system. 3. **Transparency and Compatibility**: The proposed access control method should be integrated into existing applications as transparently as possible, without the need to modify the application code or specific APIs, and can support multiple running environments. In addition, this method should also minimize the need to modify the operating system kernel or code provided by other vendors to reduce maintenance costs. ### Main Contributions - **Access Control Technique**: A time - based access control technique is proposed to ensure that operations on the GPU can be executed independently, thereby reducing time variations caused by interference. - **Software Hook Implementation**: Use software hooks to generate controllers from simple templates to modify the behavior of existing GPU routines. These hooks can easily adapt to new or updated routines and are transparent to applications without modifying the application itself. - **Evaluation and Verification**: The complexity of three different access control strategies and their impact on application performance were evaluated through experiments. The results show that adding access control brings transparency, reduces time variability, achieves isolation between GPU operations, and has low code complexity. Although some strategies may cause some potential slowdowns in applications even in an isolated state, these effects are acceptable. In conclusion, this paper mainly focuses on how to reduce interference among concurrent applications on an embedded GPU platform through access control techniques, thereby improving the predictability and reliability of the system.