ESTELLE: an Efficient and Cost-effective Cloud Log Engine
Yupu Zhang,Guanglin Cong,Jihan Qu,Ran Xu,Yuan Fu,Weiqi Li,Feiran Hu,Jing Liu,Wenliang Zhang,Kai Zheng
DOI: https://doi.org/10.1145/3626246.3653387
2024-01-01
Abstract:With the advancement of cloud computing, more and more enterprises are adopting cloud services to build a variety of applications. Monitoring and observability are integral to the complex and fragile cloud-native architecture. As an extremely important data source for both, logs play an indispensable role in applications such as code debugging, root cause analysis, troubleshooting, and trend analysis. However, the inherent characteristic of cloud logs, with TB-level daily data production per user and continuous growth over time and with business, poses core challenges for log engines. Traditional log management systems are inadequate for handling the requirements of massive log data high-frequency writing and storage, along with low-frequency retrieval and analysis in cloud environments. Exploring a low-cost, high-performance cloud-native log engine solution is an extremely extraordinary challenging task. To tackle these challenges, we propose a cost-effective cloud-native log engine, called ESTELLE, equipped with a low-cost pluggable log index framework. This engine features a compute-storage separation and read-write separation architecture, enabling linear scalability. We designed a near-lock-free writing process for handling high-frequency writing demands of massive logs. Object storage is used to significantly reduce storage costs. We also tailored ESTELLE Log Bloom filter and approximate inverted index for this cloud-native engine, applying them flexibly to enhance query efficiency and optimize various queries. Extensive experiments on real open-source log datasets have demonstrated that the ESTELLE Log Engine achieves ultra-high single-core CPU write speeds and pretty low storage costs. Furthermore, when equipped with the complete index framework, it also maintains fairly low query latency across various log scenarios.