The Llama 3 Herd of Models
Abhimanyu Dubey,Abhinav Jauhri,Abhinav Pandey,Abhishek Kadian,Ahmad Al-Dahle,Aiesha Letman,Akhil Mathur,Alan Schelten,Amy Yang,Angela Fan,Anirudh Goyal,Anthony Hartshorn,Aobo Yang,Archi Mitra,Archie Sravankumar,Artem Korenev,Arthur Hinsvark,Arun Rao,Aston Zhang,Aurelien Rodriguez,Austen Gregerson,Ava Spataru,Baptiste Roziere,Bethany Biron,Binh Tang,Bobbie Chern,Charlotte Caucheteux,Chaya Nayak,Chloe Bi,Chris Marra,Chris McConnell,Christian Keller,Christophe Touret,Chunyang Wu,Corinne Wong,Cristian Canton Ferrer,Cyrus Nikolaidis,Damien Allonsius,Daniel Song,Danielle Pintz,Danny Livshits,David Esiobu,Dhruv Choudhary,Dhruv Mahajan,Diego Garcia-Olano,Diego Perino,Dieuwke Hupkes,Egor Lakomkin,Ehab AlBadawy,Elina Lobanova,Emily Dinan,Eric Michael Smith,Filip Radenovic,Frank Zhang,Gabriel Synnaeve,Gabrielle Lee,Georgia Lewis Anderson,Graeme Nail,Gregoire Mialon,Guan Pang,Guillem Cucurell,Hailey Nguyen,Hannah Korevaar,Hu Xu,Hugo Touvron,Iliyan Zarov,Imanol Arrieta Ibarra,Isabel Kloumann,Ishan Misra,Ivan Evtimov,Jade Copet,Jaewon Lee,Jan Geffert,Jana Vranes,Jason Park,Jay Mahadeokar,Jeet Shah,Jelmer van der Linde,Jennifer Billock,Jenny Hong,Jenya Lee,Jeremy Fu,Jianfeng Chi,Jianyu Huang,Jiawen Liu,Jie Wang,Jiecao Yu,Joanna Bitton,Joe Spisak,Jongsoo Park,Joseph Rocca,Joshua Johnstun,Joshua Saxe,Junteng Jia,Kalyan Vasuden Alwala,Kartikeya Upasani,Kate Plawiak,Ke Li,Kenneth Heafield,Kevin Stone,et al. (434 additional authors not shown)
2024-08-15
Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition