Accelerating Large Language Model Training with In-Package Optical Links for Scale-Out Systems

D. Biswas,Y. Ban,Joyjit Kundu,Arindam Mallik,J. Ryckaert,N. Pantano,James Myers,Aakash Patel
DOI: https://doi.org/10.1109/ISVLSI61997.2024.00032
2024-07-01
Abstract:Training large language models (LLMs) on multiple GPUs pose a challenge to computing resources and scale-out systems in particular face communication bottlenecks with tra-ditional electrical interconnects. In-package optical interconnects enable high bandwidth connectivity at scale, using dense wave-length division multiplexing, small pitch, and low-loss waveg-uides on silicon. Current developments in 3D integration and silicon photonics, facilitate high-bandwidth links for inter-GPU communication. In this paper, we present a system technology co-optimization (STCO) study, considering 3D heterogeneous in-tegration of electrical and photonic integrated circuits embedded on a silicon wafer comprising passive waveguides as transmission channels. We study the computation and communication trade-offs for training LLMs with different dataflow considerations, on an envisaged system framework. Considering state-of-the-art baseline configuration comprising eight G PU s in a node over high bandwidth electrical intra-node links and low bandwidth optical inter- node links for multiple nodes, we scale both intra/inter node communication bandwidth with high speed Optical IO (OIO) at 4Tbps/mm bandwidth density and the computation with advanced CFET (complementary field-effect transistor) technology. We achieve bandwidth improvements up to 1.5x with 010 and CMOS scaling, leading to 3.2x training time reduction over 1024 GPUs for model sizes on the order of GPT-3 175B.
Computer Science,Engineering
What problem does this paper attempt to address?