A reference architecture for serverless big data processing
Sebastian Werner,Stefan Tai
DOI: https://doi.org/10.1016/j.future.2024.01.029
IF: 7.307
2024-02-01
Future Generation Computer Systems
Abstract:Despite significant advances in data management systems in recent decades, the processing of big data at scale remains very challenging. While cloud computing has been well-accepted as a solution to address scalability needs, cloud configuration and operation complexity persist and often present themselves as entry barriers, especially for novice data analysts. Serverless computing and Function-as-a-Service (FaaS) platforms have been suggested to reduce such entry barriers by shifting configuration and operational responsibilities from the application developer to the FaaS platform provider. Naturally, "serverless data processing (SDP)", that is, using FaaS for (big) data processing, has received increasing interest in recent years. However, FaaS platforms were never intended to support large data processing tasks primarily. SDP, therefore, manifests itself through workarounds and adaptations on the application level, addressing various quirks and limitations of the FaaS platforms in use for data processing needs. This, in turn, creates tensions between the platforms and the applications using them, again encouraging the constant (re-)design of both. Consequently, we present lessons learned from a series of application and platform re-designs that address these tensions, leading to the development of an SDP reference architecture and a platform instantiation and implementation thereof called CREW . Mitigating the tensions through the process of application platform co-design proves to reduce both entry barriers and costs significantly. In some experiments, CREW outperforms traditional, non-SDP big data processing frameworks by factors.
computer science, theory & methods