XDL: an industrial deep learning framework for high-dimensional sparse data

Biye Jiang,Chao Deng,Huimin Yi,Zelin Hu,Guorui Zhou,Yang Zheng,Sui Huang,Xinyang Guo,Dongyue Wang,Yue Song,Liqin Zhao,Zhi Wang,Peng Sun,Yu Zhang,Di Zhang,Jinhui Li,Jian Xu,Xiaoqiang Zhu,Kun Gai
DOI: https://doi.org/10.1145/3326937.3341255
2019-01-01
Abstract:With the rapid growth of data and computing power, deep learning based approaches have become the main solution for many artificial intelligence problems such as image classification, speech recognition and computer vision. Several excellent deep learning (DL) frameworks including Tensorflow, MxNet and PyTorch have been made open-sourced, further accelerating the advance of the community. However, existing DL frameworks are not designed for applications involving high-dimensional sparse data, which exists widely in many successful online businesses such as search engine, recommender systems and online advertising. In these industrial scenarios, deep models are typically trained on large scale datasets with up to billions of sparse features and hundreds of billions of samples, bringing great challenges to DL framework. In this paper, we introduce a high-performance, large-scale and distributed DL framework named XDL which provides an elegant solution to fill the gap between general design of existing DL frameworks and industrial requirements arising from high-dimensional sparse data. Since 2016, XDL has been successfully deployed in Alibaba, serving many productions such as online advertising and recommender system. Running on hundreds of GPU cards in parallel, XDL can train deep models with tens of billions parameters within only several hours. Besides its excellent performance and flexibility, XDL is also friendly to developers. Algorithm scientists in Alibaba can develop and deploy new deep models with only several lines of simple codes. The XDL API and a reference implementation were released as an open-source package under the Apache 2.0 license in December, 2018 and are available at https://github.com/alibaba/xdeeplearning.
What problem does this paper attempt to address?