SpotLake: Diverse Spot Instance Dataset Archive Service
Sungjae Lee,Jaeil Hwang,Kyungyong Lee
DOI: https://doi.org/10.48550/arXiv.2202.02973
2022-10-25
Abstract:Public cloud service vendors provide a surplus of computing resources at a cheaper price as a spot instance. Despite the cheaper price, the spot instance can be forced to be shutdown at any moment whenever the surplus resources are in shortage. To enhance spot instance usage, vendors provide diverse spot instance datasets. Amon them, the spot price information has been most widely used so far. However, the tendency toward barely changing spot price weakens the applicability of the spot price dataset. Besides the price dataset, the recently introduced spot instance availability and interruption ratio datasets can help users better utilize spot instances, but they are rarely used in reality. With a thorough analysis, we could uncover major hurdles when using the new datasets concerning the lack of historical information, query constraints, and limited query interfaces. To overcome them, we develop SpotLake, a spot instance data archive web service that provides historical information of various spot instance datasets. Novel heuristics to collect various datasets and a data serving architecture are presented. Through real-world spot instance availability experiments, we present the applicability of the proposed system. SpotLake is publicly available as a web service to speed up cloud system research to improve spot instance usage and availability while reducing cost.
Distributed, Parallel, and Cluster Computing