What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the software - engineering challenges faced when deploying machine - learning (ML) models on emerging computing platforms, especially in view of the rapid evolution of large - language models (LLMs) and the emergence of new computing platforms. Specifically: 1. **Limitations of traditional ML frameworks**: Existing ML frameworks are mainly optimized for CPU and CUDA platforms, with insufficient support for emerging platforms such as Metal, Vulkan, and WebGPU, resulting in a large gap. 2. **Rapid development of models and platforms**: With the rapid development of large - language models (LLMs) and the introduction of emerging computing platforms such as Metal and WebGPU, how to efficiently deploy these models on new platforms has become an urgent problem to be solved. 3. **Difficulties in testing and debugging**: The traditional bottom - up development method requires a large amount of manual testing and debugging work, which is not only time - consuming and labor - intensive, but also difficult to ensure the diversity and authenticity of test cases. In addition, the tool support and ecosystem of emerging platforms are relatively limited, further increasing the debugging difficulty. 4. **Complexity of migrating computing tasks**: During the process of migrating from a mature platform (such as CUDA) to an emerging platform (such as WebGPU), ensuring the correctness and performance of computing tasks is a complex challenge. To solve these problems, the author proposes TAPML (Top - Down Approach and tooling for ML), which is a top - down method and toolset aimed at simplifying the deployment of ML systems on diverse platforms, with a particular focus on improving developer productivity and ensuring model reliability and efficiency. TAPML achieves this goal through automated unit - test generation, gradual migration of computing tasks, and providing a unified runtime environment. ### Main contributions of TAPML 1. **Dimension**: Although the optimization and abstraction of ML computing have been deeply studied in the past decade, this paper focuses on the productivity challenges of deploying emerging ML systems and addresses this challenge by improving software - development methods. 2. **Methodology**: A top - down method TAPML is proposed for deploying emerging ML models on emerging platforms. Compared with the traditional bottom - up scheme, TAPML automates unit tests by cutting test cases from the execution of mature platforms and adopts a migration - based method to gradually migrate computing tasks from the source platform to the target platform. 3. **Case study**: A comprehensive case study is formed by summarizing the deployment experience of 82 emerging models on 5 new platforms to promote the development of future emerging ML systems. ### Summary The core problem of this paper is to solve the software - engineering challenges encountered when deploying machine - learning models on emerging computing platforms, especially in the context of the rapid development of large - language models. By proposing TAPML, a top - down method and toolset, the author hopes to improve developer productivity, ensure model reliability and efficiency, and accelerate the application of ML models on diverse platforms.

Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development

An Orchestrated Empirical Study on Deep Learning Frameworks and Platforms

An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

LLMs as On-demand Customizable Service

Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges

An Empirical Study on Challenges for LLM Application Developers

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

Software Service Engineering in the Era of Large Language Models

Roadmap on Emerging Hardware and Technology for Machine Learning

LLM-based Frameworks for Power Engineering from Routine to Novel Tasks

Understanding LLMs: A Comprehensive Overview from Training to Inference

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

Scalable End-to-End ML Platforms: from AutoML to Self-serve

Petuum: A New Platform for Distributed Machine Learning on Big Data

Rethinking Machine Learning Development and Deployment for Edge Devices

Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact

ELMS: Elasticized Large Language Models On Mobile Devices

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey