Transparently Capturing Execution Path of Service/Job Request Processing

Yong Yang,Long Wang,Jing Gu,Ying Li
DOI: https://doi.org/10.1007/978-3-030-03596-9_63
2018-01-01
Abstract:Distributed platforms are widely deployed to provide services in various trades. With the increasing scale and complexity of these distributed platforms, it is becoming more and more challenging to understand and diagnose a service request’s processing in a distributed platform, as even one simple service request may traverse numerous heterogeneous components across multiple hosts. Thus, it is highly demanded to capture the complete end-to-end execution path of service requests among all involved components accurately. This paper presents REPTrace, a generic methodology for capturing the complete request execution path (REP) in a transparent fashion. We propose principles for identifying causal relationships among events for a comprehensive list of execution scenarios, and stitch all events to generate complete request execution paths based on library/system calls tracing and network labelling. The experiments on different distributed platforms with different workloads show that REPTrace transparently captures the accurate request execution path with reasonable latency and negligible network overhead.
What problem does this paper attempt to address?