Abstract:Cloud computing has witnessed tremendous growth, prompting enterprises to migrate to the cloud for reliable and on-demand computing. Within a single Virtual Private Cloud (VPC), the number of instances (such as VMs, bare metals, and containers) has reached millions, posing challenges related to supporting millions of instances with network location decoupling from the underlying hardware, high elastic performance, and high reliability. However, academic studies have primarily focused on specific issues like high-speed data plane and virtualized routing infrastructure, while existing industrial network technologies fail to adequately address these challenges. In this paper, we report on the design and experience of Achelous , Alibaba Cloud's network virtualization platform. Achelous consists of three key designs to enhance hyperscale VPC: ( i ) a novel hierarchical programming architecture based on the collaborative design of both data plane and control plane; ( ii ) elastic performance strategy and distributed ECMP schemes for seamless scale-up and scale-out, respectively; ( iii ) health check scheme and transparent VM live migration mechanisms that ensure stateful flow continuity during the failover. The evaluation results demonstrate that, Achelous scales to over 1, 500, 000 of VMs with elastic network capacity in a single VPC, and reduces 25× programming time, with 99% updating can be completed within 1 second. For failover, it condenses 22.5× downtime during VM live migration, and ensures 99.99% of applications do not experience stall. More importantly, the experience from three years of operation proves the Achelous 's serviceability, and versatility independent of any specific hardware platforms.

Alibaba Hologres

Alibaba hologres: a cloud-native service for hybrid serving/analytical processing

PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba.

AI-oriented Workload Allocation for Cloud-Edge Computing.

Anser: Adaptive Information Sharing Framework of AnalyticDB

Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

E3: an Elastic Execution Engine for Scalable Data Processing.

Achelous: Enabling Programmability, Elasticity, and Reliability in Hyperscale Cloud Networks.

Optimizing NVMe Storage for Large-scale Deployment: Key Technologies and Strategies in Alibaba Cloud

HANSEL: Adaptive Horizontal Scaling of Microservices Using Bi-LSTM

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Analytics-as-a-Service in a Multi-Cloud Environment through Semantically enabled Hierarchical Data Processing

Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAP

Fangorn

A System for Exploratory Analysis in Cloud

Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data

Hydra: Brokering Cloud and HPC Resources to Support the Execution of Heterogeneous Workloads at Scale

Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service

H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS