TRACTOR: Traffic Analysis and Classification Tool for Open RAN

Joshua Groen,Mauro Belgiovine,Utku Demir,Brian Kim,Kaushik Chowdhury
2023-12-13
Abstract:5G and beyond cellular networks promise remarkable advancements in bandwidth, latency, and connectivity. The emergence of Open Radio Access Network (O-RAN) represents a pivotal direction for the evolution of cellular networks, inherently supporting machine learning (ML) for network operation control. Within this framework, RAN Intelligence Controllers (RICs) from one provider can employ ML models developed by third-party vendors through the acquisition of key performance indicators (KPIs) from geographically distant base stations or user equipment (UE). Yet, the development of ML models hinges on the availability of realistic and robust datasets. In this study, we embark on a two-fold journey. First, we collect a comprehensive 5G dataset, harnessing real-world cell phones across diverse applications, locations, and mobility scenarios. Next, we replicate this traffic within a full-stack srsRAN-based O-RAN framework on Colosseum, the world's largest radio frequency (RF) emulator. This process yields a robust and O-RAN compliant KPI dataset mirroring real-world conditions. We illustrate how such a dataset can fuel the training of ML models and facilitate the deployment of xApps for traffic slice classification by introducing a CNN based classifier that achieves accuracy $>95\%$ offline and $92\%$ online. To accelerate research in this domain, we provide open-source access to our toolchain and supplementary utilities, empowering the broader research community to expedite the creation of realistic and O-RAN compliant datasets.
Systems and Control,Networking and Internet Architecture
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issues of traffic classification and slicing allocation in 5G and future cellular networks. Specifically, the paper focuses on the following key challenges: 1. **Lack of real-world 5G traffic datasets**: Existing research often relies on simulated data or datasets that do not fully comply with O-RAN standards, limiting the effectiveness of machine learning (ML) model training. 2. **Traffic classification in O-RAN systems**: In O-RAN systems, the near-real-time RIC (near-RT RIC) does not have access to complete 5-tuple flow information, Layer 2 frames, or KPIs used by existing 5G classifiers, making traditional traffic classification methods difficult to apply. 3. **User privacy and system security**: Existing traffic classification methods may leak user information, violating privacy protection principles. To address these challenges, the paper introduces TRACTOR (OpenRAN Traffic Analysis and Classification Tool), which achieves its goals through the following steps: 1. **Collecting real 5G traffic data**: Generating a comprehensive 5G dataset using actual mobile phones in different application scenarios, locations, and mobility scenarios. 2. **Replaying traffic within the O-RAN framework**: Using the srsRAN framework on Colosseum (the world's largest RF emulator) to replay traffic and generate a KPI dataset that complies with O-RAN standards. 3. **Developing ML models for traffic classification**: Training convolutional neural network (CNN) models using the generated KPI dataset to achieve high-accuracy traffic classification. 4. **Open-sourcing the toolchain and dataset**: Providing an open-source toolchain and dataset to enable other researchers to generate O-RAN-compliant datasets, accelerating research progress in the related field. Through these steps, TRACTOR demonstrates the feasibility of automated traffic classification and slicing allocation in O-RAN systems, achieving an offline classification accuracy of over 95% and an online classification accuracy of over 92%. This provides strong support for optimizing future networks across multiple performance metrics while ensuring user privacy and system security.