Abstract:Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at <a class="link-external link-https" href="https://github.com/intel-analytics/BigDL/" rel="external noopener nofollow">this https URL</a> under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to seamlessly scale an AI project from a Python notebook on a single laptop to a high - performance distributed cluster capable of handling larger datasets to support large - scale experiments and production deployments. Specifically, the paper focuses on simplifying the complex and error - prone manual steps involved in this process, enabling data scientists to fully utilize hardware resources (such as SIMD instructions, multiprocessing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). ### Problem Background Most AI projects usually start from a Python notebook running on a single laptop or workstation. However, when it is necessary to handle larger datasets, data scientists must go through a series of complex and error - prone manual steps in order to fully utilize the available hardware resources. These steps include: - Optimization using SIMD instructions - Multiprocessing parallelization - Model quantization - Memory allocation optimization - Data partitioning - Distributed computing These manual operations are not only complex but also error - prone, increasing the difficulty of project development and maintenance. ### Solution To address the above challenges, the author open - sourced the BigDL 2.0 toolkit, which combines the original BigDL and Analytics Zoo projects. The main features of BigDL 2.0 include: 1. **Transparent Acceleration**: Users can build regular Python notebooks using standard APIs on local notebooks and automatically accelerate model training and inference through BigDL 2.0, achieving a speed - up of up to 9.6 times. 2. **Seamless Scaling**: BigDL 2.0 can seamlessly scale the AI pipeline to large clusters, spanning hundreds of servers, without the need for invasive code modifications. 3. **End - to - End Pipeline Optimization**: BigDL 2.0 optimizes the entire AI pipeline, including data pre - processing, feature transformation, hyperparameter tuning, model training and inference, model optimization and deployment, etc. 4. **Automated Machine Learning (AutoML) Support**: Through the built - in AutoML function, BigDL 2.0 can help users automate hyperparameter searches and improve model development efficiency. ### Implementation Method BigDL 2.0 achieves these goals through two main libraries: - **BigDL - Nano**: Used for transparently accelerating the AI pipeline on a single node, integrating multiple optimization techniques such as SIMD instructions, multiprocessing, quantization, memory allocation optimization, etc. - **BigDL - Orca**: Used for seamlessly expanding AI applications, automatically configuring distributed data processing and AI systems (such as Apache Spark and Ray), and efficiently performing data parallel processing, model training and inference in a distributed environment. ### Conclusion Through BigDL 2.0, users can easily build AI pipelines on local notebooks and seamlessly scale them to large - scale distributed clusters, thereby significantly improving the efficiency and performance of processing large - scale datasets. This toolkit has been widely used and verified in practical application scenarios such as Mastercard, Burger King, and Inspur.

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

BigDL: A Distributed Deep Learning Framework for Big Data

KunPeng: Parameter Server Based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

PowerAI DDL

Deep Learning At Scale and At Ease

AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

WidePipe: High-Throughput Deep Learning Inference System on a Cluster of Neural Processing Units

GaDei: On Scale-up Training As A Service For Deep Learning

Accelerating End-to-End Deep Learning Workflow With Codesign of Data Preprocessing and Scheduling.

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

High Performance I/O For Large Scale Deep Learning

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

Strategies for Optimizing End-to-End Artificial Intelligence Pipelines on Intel Xeon Processors

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Accelerated Cloud for Artificial Intelligence (ACAI)

Effective Elastic Scaling of Deep Learning Workloads