Abstract:Public clouds provide scalable and cost-efficient computing through resource sharing. However, moving from traditional on-premises service management to clouds introduces new challenges; failure to correctly provision, maintain, or decommission elastic services can lead to functional failure and vulnerability to attack. In this paper, we explore a broad class of attacks on clouds which we refer to as cloud squatting. In a cloud squatting attack, an adversary allocates resources in the cloud (e.g., IP addresses) and thereafter leverages latent configuration to exploit prior tenants. To measure and categorize cloud squatting we deployed a custom Internet telescope within the Amazon Web Services us-east-1 region. Using this apparatus, we deployed over 3 million servers receiving 1.5 million unique IP addresses (56% of the available pool) over 101 days beginning in March of 2021. We identified 4 classes of cloud services, 7 classes of third-party services, and DNS as sources of exploitable latent configurations. We discovered that exploitable configurations were both common and in many cases extremely dangerous; we received over 5 million cloud messages, many containing sensitive data such as financial transactions, GPS location, and PII. Within the 7 classes of third-party services, we identified dozens of exploitable software systems spanning hundreds of servers (e.g., databases, caches, mobile applications, and web services). Lastly, we identified 5446 exploitable domains spanning 231 eTLDs-including 105 in the top 10,000 and 23 in the top 1000 popular domains. Through tenant disclosures we have identified several root causes, including (a) a lack of organizational controls, (b) poor service hygiene, and (c) failure to follow best practices. We conclude with a discussion of the space of possible mitigations and describe the mitigations to be deployed by Amazon in response to this study.

How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform

Dependability Analysis on Open Stack IaaS Cloud: Bug Anaysis and Fault Injection

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Characterizing and Predicting Bug Assignment in OpenStack

Emergent Failures: Rethinking Cloud Reliability at Scale.

An Approach to Pinpointing Bug-Induced Failure in Logs of Open Cloud Platforms

Quantitative Evaluation of Fault Propagation in a Commercial Cloud System.

Towards Runtime Verification via Event Stream Processing in Cloud Computing Infrastructures

Improving Failure Tolerance in Large-Scale Cloud Computing Systems

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

How to Shutdown a Cloud: a DDoS Attack in a Private Infrastructure-As-a-service Cloud.

On Software Ageing Indicators in OpenStack

The Cloud's Cloudy Moment: A Systematic Survey of Public Cloud Service Outage

Probing the Scheduling Algorithms in the Cloud Based on OpenStack.

Mutiny! How does Kubernetes fail, and what can we do about it?

Bugs in Pods: Understanding Bugs in Container Runtime Systems

Icebergs in the Clouds: the Other Risks of Cloud Computing

Measuring and Mitigating the Risk of IP Reuse on Public Clouds

Characterization of operational failures from a business data processing SaaS platform

Chaos as a Software Product Line—A platform for improving open hybrid‐cloud systems resiliency

Rain or Shine? — Making Sense of Cloudy Reliability Data