Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development

Siyuan Feng,Jiawei Liu,Ruihang Lai,Charlie F. Ruan,Yong Yu,Lingming Zhang,Tianqi Chen
2024-04-16
Abstract:Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leaving a big gap in enabling emerging ones like Metal, Vulkan, and WebGPU.
Software Engineering,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the software - engineering challenges faced when deploying machine - learning (ML) models on emerging computing platforms, especially in view of the rapid evolution of large - language models (LLMs) and the emergence of new computing platforms. Specifically: 1. **Limitations of traditional ML frameworks**: Existing ML frameworks are mainly optimized for CPU and CUDA platforms, with insufficient support for emerging platforms such as Metal, Vulkan, and WebGPU, resulting in a large gap. 2. **Rapid development of models and platforms**: With the rapid development of large - language models (LLMs) and the introduction of emerging computing platforms such as Metal and WebGPU, how to efficiently deploy these models on new platforms has become an urgent problem to be solved. 3. **Difficulties in testing and debugging**: The traditional bottom - up development method requires a large amount of manual testing and debugging work, which is not only time - consuming and labor - intensive, but also difficult to ensure the diversity and authenticity of test cases. In addition, the tool support and ecosystem of emerging platforms are relatively limited, further increasing the debugging difficulty. 4. **Complexity of migrating computing tasks**: During the process of migrating from a mature platform (such as CUDA) to an emerging platform (such as WebGPU), ensuring the correctness and performance of computing tasks is a complex challenge. To solve these problems, the author proposes TAPML (Top - Down Approach and tooling for ML), which is a top - down method and toolset aimed at simplifying the deployment of ML systems on diverse platforms, with a particular focus on improving developer productivity and ensuring model reliability and efficiency. TAPML achieves this goal through automated unit - test generation, gradual migration of computing tasks, and providing a unified runtime environment. ### Main contributions of TAPML 1. **Dimension**: Although the optimization and abstraction of ML computing have been deeply studied in the past decade, this paper focuses on the productivity challenges of deploying emerging ML systems and addresses this challenge by improving software - development methods. 2. **Methodology**: A top - down method TAPML is proposed for deploying emerging ML models on emerging platforms. Compared with the traditional bottom - up scheme, TAPML automates unit tests by cutting test cases from the execution of mature platforms and adopts a migration - based method to gradually migrate computing tasks from the source platform to the target platform. 3. **Case study**: A comprehensive case study is formed by summarizing the deployment experience of 82 emerging models on 5 new platforms to promote the development of future emerging ML systems. ### Summary The core problem of this paper is to solve the software - engineering challenges encountered when deploying machine - learning models on emerging computing platforms, especially in the context of the rapid development of large - language models. By proposing TAPML, a top - down method and toolset, the author hopes to improve developer productivity, ensure model reliability and efficiency, and accelerate the application of ML models on diverse platforms.