DynATOS$+$: A Network Telemetry System for Dynamic Traffic and Query Workloads

Chris Misa,Ramakrishnan Durairajan,Reza Rejaie,Walter Willinger
DOI: https://doi.org/10.1109/tnet.2024.3367432
2024-01-01
IEEE/ACM Transactions on Networking
Abstract:Network telemetry systems provide critical visibility into the state of network traffic. By leveraging modern programmable switch hardware, significant progress has been made to scale these systems to production network traffic workloads. Less attention has been paid towards efficiently utilizing these hardware targets’ limited resources in the face of dynamics such as the composition of the traffic workload as well as the number and types of queries running at any given point in time. However, both of these dynamics have implications on resource requirements and query accuracy. Building on our prior work DynATOS, which argues that this dynamics problem motivates reframing telemetry systems as resource schedulers, we present in this paper the design, implementation, and evaluation of DynATOS+. DynATOS+ relies on the same efficient time-division approximation and scheduling algorithm that DynATOS uses and that allows for user-defined query accuracy and latency specifications that are intended to result in tradeoffs with respect to query execution to reduce hardware resource usage. However, unlike DynATOS, DynATOS+ significantly reduces the burden on end users to express their queries by allowing them to use simple-to-state accuracy goals. For example, the method for specifying per-query accuracy goals in DynATOS+ no longer requires end users to either know the average range of query results in advance or to submit multiple trial queries to tune their accuracy goal specifications. We perform extensive simulation-based evaluations that (i) show that this new functionality of DynATOS+ works in practice, (ii) illustrate in detail the tradeoffs that result with respect to query execution and hardware resource usage for a wide range of systems parameters, and (iii) allow for an assessment of system performance under changing query workloads on top of changes in the composition of traffic workloads that has eluded previous work in this area.
telecommunications,computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?