Network Architecture and Technologies for Large Generative Models

唐宏 Tang,Xu,张宁 Xiaoqing,,Juan ,徐晓青,Hong ,武娟,Wu,Zhang
Abstract:: The training of large generative models has posed demands for ultra-large-scale, low latency, high bandwidth, and high-availability network infrastructure. The technological development roadmap and implementation schemes of high-performance network in⁃ frastructure for large models are investigated. It is believed that the customized network architecture design and transport protocol optimiza⁃ tion should be carried out based on workloads and traffic patterns at different training stages during commercial deployment. Flow control/ congestion control technologies, load balancing technologies, automated operation and maintenance solutions, and deterministic network transmission technologies for wide-area remote direct memory access (RDMA) are key research directions for the future.
Engineering,Computer Science
What problem does this paper attempt to address?