Abstract:We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds . Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free—a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question by conducting benchmark on the mainstream machine learning clouds. Since these platforms continue to innovate, our benchmark tries to reflect their evolvement. Concretely, this paper consists of two sub-benchmarks— mlbench and automlbench . When we first started this work in 2016, only two cloud platforms provide machine learning services and limited themselves to model training and simple hyper-parameter tuning. We then focus on binary classification problems and present mlbench , a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench . In the recent few years, more cloud providers support machine learning and include automatic machine learning (AutoML) techniques in their machine learning clouds. Their AutoML services can ease manual tuning on the whole machine learning pipeline, including but not limited to data preprocessing, feature selection, model selection, hyper-parameter, and model ensemble. To reflect these advancements, we design automlbench to assess the AutoML performance of four machine learning clouds using different kinds of workloads. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.

AI Matrix: A Deep Learning Benchmark for Alibaba Data Centers

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking

Aibench: an industry standard ai benchmark suite

AIBench: an Industry Standard AI Benchmark Suite from Internet Services.

P F ] 1 3 A ug 2 01 9 HPC AI 500 : A Benchmark Suite for HPC AI Systems

HPC AI500: A Benchmark Suite for HPC AI Systems

Edge AIBench: Towards Comprehensive End-to-End Edge Computing Benchmarking.

AI Matrix - Synthetic Benchmarks for DNN

KunPeng: Parameter Server Based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial

Scenario-Based AI Benchmark Evaluation of Distributed Cloud/Edge Computing Systems

AIBench Training: Balanced Industry-Standard AI Training Benchmarking

DLBench: An Experimental Evaluation of Deep Learning Frameworks

DLBench: a comprehensive experimental evaluation of deep learning frameworks

Understanding the Energy Consumption of HPC Scale Artificial Intelligence

How good are machine learning clouds? Benchmarking two snapshots over 5 years

HPC AI500: Representative, Repeatable and Simple HPC AI Benchmarking

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training