Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML Accelerators

Sarbartha Banerjee,Shijia Wei,Prakash Ramrakhyani,Mohit Tiwari
2024-09-04
Abstract:Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in secure machine - learning (ML) accelerators, how to optimize workload mapping through state - space exploration to improve performance and energy efficiency. Specifically, the paper proposes an optimization framework named Obsidian, which aims to find the best mapping from ML kernels to secure ML accelerators, thereby reducing inference latency and improving resource utilization. ### Problem Background 1. **Importance of Trusted Execution Environments (TEEs)**: - TEEs are crucial for the security and efficiency of machine - learning accelerators. - Optimizing the workload of accelerator architectures through state - space exploration can improve performance and energy consumption. 2. **Limitations of Existing Methods**: - Current research uses fast - analysis models, ignoring key hardware details and cross - layer opportunities. - Although cycle - accurate models can theoretically achieve better designs, their high runtime costs limit the scope of state - space exploration. ### Obsidian's Solution Obsidian explores the state - space by using analysis models and cycle - accurate models cooperatively, specifically including: 1. **Secure Accelerator Analysis Model**: - It includes the impact of secure hardware and generates the best model mapping when traversing the large mapping state - space. 2. **Compiler Profile Step**: - Conduct profile analysis on the cycle - accurate model, capture runtime bottlenecks, further improve execution time, energy, and resource utilization, and find the optimal model mapping. ### Main Contributions - **Analysis Phase**: Efficiently search the large state - space through the analysis model. - **Profile Phase**: Solve runtime bottlenecks through compiler profiles. - **Co - exploration**: Combine the advantages of the analysis model and compiler profiles to find the optimal mapping. ### Experimental Results Compared with the baseline secure accelerator, Obsidian performs excellently in the following aspects: - The analysis model reduces 20.5% of the inference latency and 8.4% of the edge - deployment latency in the cloud environment, while reducing energy consumption by 24% and 19% respectively. - The cycle - accurate model further reduces 9.1% of the latency in the cloud environment and 12.2% of the latency in edge - deployment, while reducing energy consumption by 13.8% and 13.1% respectively. ### Summary Through cooperative state - space exploration, Obsidian effectively solves the performance and energy - efficiency optimization problems in secure ML accelerators, providing significant improvements for practical applications.