Observability in Fog Computing

Aleteia Araujo,Breno Costa,Joao Bachiega Jr,Leonardo R. Carvalho,Rajkumar Buyya
2024-11-26
Abstract:Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analytics, and decision-making at the edge of the network. However, the high distribution of its constituent nodes and resource-restricted devices interconnected by heterogeneous and unreliable networks makes it challenging to execute service maintenance and troubleshooting, increasing the time to restore the application after failures and not guaranteeing the service level agreements. In such a scenario, increasing the observability of Fog applications and services may speed up troubleshooting and increase their availability. An observability system is a data-intensive service, and Fog Computing could have its nodes and channels saturated with an additional load. In this work, we detail the three pillars of observability (metrics, log, and traces), discuss the challenges, and clarify the approaches for increasing the observability of services in Fog environments. Furthermore, the system architecture that supports observability in Fog, related tools, and technologies are presented, providing a comprehensive discussion on this subject. An example of a solution shows how a real-world application can benefit from increased observability in this environment. Finally, there is a discussion about the future directions of Fog observability.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: In the fog computing environment, due to its highly distributed nodes and resource - constrained devices interconnected through heterogeneous and unreliable networks, service maintenance and troubleshooting become difficult, increasing the application recovery time and failing to guarantee the Service - Level Agreement (SLA). To meet these challenges, improving the observability of applications and services in the fog computing environment can accelerate troubleshooting and improve their availability. However, the observation system is a data - intensive service, which may saturate the nodes and channels of fog computing with additional loads. Specifically, the paper discusses in detail the three pillars of observability (metrics, logs, and traces), analyzes the challenges and methods of improving observability in the fog environment, and introduces the system architecture, related tools, and technologies that support observability. In addition, the paper also provides an example of a practical application, showing how to benefit from improving observability, and discusses future research directions. ### Summary of the core issues of the paper 1. **Challenges in service maintenance and troubleshooting**: - Nodes in the fog computing environment are highly distributed, and devices have limited resources. - The network is heterogeneous and unreliable, resulting in difficulties in service maintenance and troubleshooting. - It increases the difficulty of application recovery time and SLA guarantee. 2. **The need to improve observability**: - By improving observability, troubleshooting can be accelerated and the availability of applications can be improved. - The observability system needs to process a large amount of data, which may increase the burden on the fog computing environment. 3. **Specific solutions**: - Discusses in detail the three pillars of observability (metrics, logs, and traces). - Analyzes the challenges and methods of improving observability in the fog environment. - Introduces the system architecture, related tools, and technologies that support observability. - Provides an example of a practical application, showing how to benefit from improving observability. - Discusses future research directions. ### Formula representation The observability index formula mentioned in the paper is as follows: \[ \text{Observability} = \text{Metrics} + \text{Logs} + \text{Traces} + (\text{Metrics} \times \text{Logs} \times \text{Traces}) \] where: - The values of \(\text{Metrics}\), \(\text{Logs}\), \(\text{Traces}\) are: \[ \begin{cases} 1, & \text{if the corresponding data domain has data for analysis} \\ 0, & \text{otherwise} \end{cases} \] - \(X\) is an operator used to filter data for each different data domain and return a subset that matches a specific time period. This formula emphasizes the synergy between different data domains, rather than simply adding up the presence of each data domain.