Bullion: A Column Store for Machine Learning

Gang Liao,Ye Liu,Jianjun Chen,Daniel J. Abadi
2024-04-13
Abstract:The past two decades have witnessed columnar storage revolutionizing data warehousing and analytics. However, the rapid growth of machine learning poses new challenges to this domain. This paper presents Bullion, a columnar storage system tailored for machine learning workloads. Bullion addresses the complexities of data compliance, optimizes the encoding of long sequence sparse features, efficiently manages wide-table projections, and introduces feature quantization in storage. By aligning with the evolving requirements of ML applications, Bullion extends columnar storage to various scenarios, from advertising and recommendation systems to the expanding realm of Generative AI.
Databases,Machine Learning
What problem does this paper attempt to address?