Warehouse-scale video acceleration: co-design and deployment in the wild
Parthasarathy Ranganathan,Daniel Stodolsky,Jeff Calow,Jeremy Dorfman,Marisabel Guevara,Clinton Wills Smullen,Aki Kuusela,Raghu Balasubramanian,Sandeep Bhatia,Prakash Chauhan,Anna Cheung,In Suk Chong,Niranjani Dasharathi,Jia Feng,Brian Fosco,Samuel Foss,Ben Gelb,Sara J. Gwin,Yoshiaki Hase,Da-ke He,C. Richard Ho,Roy W. Huffman,Elisha Indupalli,Indira Jayaram,Poonacha Kongetira,Cho Mon Kyaw,Aaron Laursen,Yuan Li,Fong Lou,Kyle A. Lucke,JP Maaninen,Ramon Macias,Maire Mahony,David Alexander Munday,Srikanth Muroor,Narayana Penukonda,Eric Perkins-Argueta,Devin Persaud,Alex Ramirez,Ville-Mikko Rautio,Yolanda Ripley,Amir Salek,Sathish Sekar,Sergey N. Sokolov,Rob Springer,Don Stark,Mercedes Tan,Mark S. Wachsler,Andrew C. Walton,David A. Wickeraad,Alvin Wijaya,Kwan Wu,Clinton Wills Smullen IV,Roy W. Huffman Jr.,Hon Kwan Wu
DOI: https://doi.org/10.1145/3445814.3446723
2021-04-17
Abstract:Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments.