Hedge Your Bets: Optimizing Long-term Cloud Costs by Mixing VM Purchasing Options

Pradeep Ambati,Noman Bashir,David Irwin,Mohammad Hajiesmaili,Prashant Shenoy
DOI: https://doi.org/10.48550/arXiv.2004.04302
2020-04-09
Abstract:Cloud platforms offer the same VMs under many purchasing options that specify different costs and time commitments, such as on-demand, reserved, sustained-use, scheduled reserve, transient, and spot block. In general, the stronger the commitment, i.e., longer and less flexible, the lower the price. However, longer and less flexible time commitments can increase cloud costs for users if future workloads cannot utilize the VMs they committed to buying. Large cloud customers often find it challenging to choose the right mix of purchasing options to reduce their long-term costs, while retaining the ability to adjust capacity up and down in response to workload variations. To address the problem, we design policies to optimize long-term cloud costs by selecting a mix of VM purchasing options based on short- and long-term expectations of workload utilization. We consider a batch trace spanning 4 years from a large shared cluster for a major state University system that includes 14k cores and 60 million job submissions, and evaluate how these jobs could be judiciously executed using cloud servers using our approach. Our results show that our policies incur a cost within 41% of an optimistic optimal offline approach, and 50% less than solely using on-demand VMs.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to optimize long - term cloud costs by choosing different virtual machine (VM) purchase options, so as to minimize expenses while meeting workload requirements. Specifically, the paper focuses on the challenges faced by large - scale cloud computing users, that is, how to select an appropriate combination of VM purchase options to reduce long - term costs while maintaining a certain degree of flexibility to cope with workload changes. ### Problem Background Cloud computing platforms offer multiple VM purchase options, each with different prices and time commitments. For example: - **On - demand instances**: Users can request and release VMs at any time, but the price is relatively high. - **Reserved instances**: Users need to commit in advance to purchasing VM time for 1 or 3 years, but can obtain a large discount. - **Pre - emptive instances**: The price is the lowest, but may be revoked by the cloud platform at any time. The characteristics of these purchase options are that the longer the time commitment and the lower the flexibility, the lower the price usually is. However, if the actual future workload cannot fully utilize these VMs, an overly long time commitment will instead increase cloud costs. ### Research Objectives The goal of the paper is to design a strategy to optimize long - term cloud costs by combining short - term and long - term workload expectations and selecting the optimal combination of VM purchase options. Specifically, the paper hopes to solve the problem in the following ways: 1. **Consider short - term and long - term expectations of workloads**: Predict future workloads based on historical data and select the most appropriate VM purchase options accordingly. 2. **Balance cost and flexibility**: While pursuing the maximum discount, retain a certain degree of flexibility to cope with workload changes. 3. **Evaluate the combined effects of different purchase options**: By using different VM purchase options in combination, find the best combination that can both reduce costs and maintain a certain degree of flexibility. ### Main Contributions The main contributions of the paper include: - Proposing a VM purchase option selection strategy based on short - term and long - term workload expectations. - Conducting large - scale experimental verification using 4 - year batch job tracking data (including 60 million job submissions). - The results show that the proposed strategy can control costs within 41% of the optimal offline strategy, and is 50% lower than the cost of using only on - demand instances and 79% lower than the cost of using only reserved instances. Through these studies, the paper provides effective strategies for cloud computing users to help them achieve cost optimization when migrating to the cloud.