TIMBER: On supporting data pipelines in Mobile Cloud Environments

Dimitrios Tomaras,Michail Tsenos,Vana Kalogeraki,Dimitrios Gunopulos
2024-10-09
Abstract:The radical advances in mobile computing, the IoT technological evolution along with cyberphysical components (e.g., sensors, actuators, control centers) have led to the development of smart city applications that generate raw or pre-processed data, enabling workflows involving the city to better sense the urban environment and support citizens' everyday lives. Recently, a new era of Mobile Edge Cloud (MEC) infrastructures has emerged to support smart city applications that aim to address the challenges raised due to the spatio-temporal dynamics of the urban crowd as well as bring scalability and on-demand computing capacity to urban system applications for timely response. In these, resource capabilities are distributed at the edge of the network and in close proximity to end-users, making it possible to perform computation and data processing at the network edge. However, there are important challenges related to real-time execution, not only due to the highly dynamic and transient crowd, the bursty and highly unpredictable amount of requests but also due to the resource constraints imposed by the Mobile Edge Cloud environment. In this paper, we present TIMBER, our framework for efficiently supporting mobile daTa processing pIpelines in MoBile cloud EnviRonments that effectively addresses the aforementioned challenges. Our detailed experimental results illustrate that our approach can reduce the operating costs by 66.245% on average and achieve up to 96.4% similar throughput performance for agnostic workloads.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to efficiently support the execution of mobile data processing pipelines in the mobile cloud - computing environment. Especially under resource - constrained and highly dynamic workloads, how to ensure real - time response and reduce costs. Specifically, the paper proposes a framework named TIMBER, aiming to address the following key challenges: 1. **Real - time execution**: Due to the high dynamics and transience of the crowd in the urban environment and the resource limitations in the mobile edge cloud - computing environment, achieving real - time data processing is a complex problem. For example, in the AMBER alert application, it is necessary to analyze the data of traffic cameras in real - time to identify events of interest. 2. **Cold - start problem**: In serverless computing, when a function is first called, the process of loading from persistent storage to main memory is called cold - start. This may significantly increase the execution time of the function, especially when the request volume suddenly increases. In addition, the resource requirements and invocation frequencies of different functions vary greatly, making it difficult to predict function invocations. 3. **Resource scheduling and allocation**: Existing schedulers are usually based on the principle of locality, assigning invocations of the same function to randomly selected worker nodes without considering load conditions. This method is not effective in handling highly skewed workloads. In addition, cloud service providers are insufficient in supporting efficient resource allocation mechanisms and cannot effectively deal with sudden and unpredictable traffic. To solve the above problems, the paper proposes the following methods: - **TIMBER framework**: Use a neural network prediction model to predict the optimal replication degree and configuration of each serverless function to meet real - time deadlines and minimize operating costs. - **Graph similarity method**: Utilize the graph edit distance (GED) to measure the similarity between different data processing pipelines, so as to estimate resources based on the historical data of similar pipelines without prior knowledge. - **Support for zero prior knowledge**: Even for brand - new and unknown data processing pipelines, TIMBER can find similar pipelines through similarity matching and use the trained neural network model to estimate the required resource configurations. ### Mathematical formula representation 1. **Pipeline Completion Time (PCT)**: \[ T_{\text{pct}}(f_k)=T_{\text{init}}(f_k)+\sum_{i = 1}^{N}T(f_k)+q_{u_k} \] where: - \(T_{\text{init}}(f_k)\) is the initialization overhead, that is, the time required to instantiate all containers. - \(T(f_k)\) is the execution time of each serverless function. - \(q_{u_k}\) is the queuing time in the platform queue. 2. **Optimization objective**: \[ \max P(W =\{\vec{f_k}\}) \] Constraints: \[ T_{\text{pct}}(f_k)\leq d_k \] 3. **Graph edit distance (GED)**: \[ \text{GED}(CG_1, CG_2) \] Represents the minimum edit distance between two call graphs \(CG_1\) and \(CG_2\). Through these methods, TIMBER can efficiently support the execution of data processing pipelines in the mobile cloud - computing environment, reduce operating costs and improve throughput performance.