Abstract:Because of reinforcement learning's (RL) ability to automatically create more adaptive controlling logics beyond the hand-crafted heuristics, numerous effort has been made to apply RL to congestion control (CC) design for real time video communication (RTC) applications and has successfully shown promising benefits over the rule-based RTC CCs. Online reinforcement learning is often adopted to train the RL models so the models can directly adapt to real network environments. However, its trail-and-error manner can also cause catastrophic degradation of the quality of experience (QoE) of RTC application at run time. Thus, safeguard strategies such as falling back to hand-crafted heuristics can be used to run along with RL models to guarantee the actions explored in the training sensible, despite that these safeguard strategies interrupt the learning process and make it more challenging to discover optimal RL policies. The recent emergence of loss-tolerant neural video codecs (NVC) naturally provides a layer of protection for the online learning of RL-based congestion control because of its resilience to packet losses, but such packet loss resilience have not been fully exploited in prior works yet. In this paper, we present a reinforcement learning (RL) based congestion control which can be aware of and takes advantage of packet loss tolerance characteristic of NVCs via reward in online RL learning. Through extensive evaluation on various videos and network traces in a simulated environment, we demonstrate that our NVC-aware CC running with the loss-tolerant NVC reduces the training time by 41\% compared to other prior RL-based CCs. It also boosts the mean video quality by 0.3 to 1.6dB, lower the tail frame delay by 3 to 200ms, and reduces the video stalls by 20\% to 77\% in comparison with other baseline RTC CCs.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the congestion control (CC) problem in real - time video communication (RTC), especially how to improve the learning efficiency of the congestion control algorithm based on reinforcement learning (RL) and the quality of user experience (QoE). Specifically: 1. **Risks of online reinforcement learning**: Although online reinforcement learning enables the model to directly adapt to the real - network environment, its trial - and - error mechanism may lead to a decline in system performance, such as a decrease in video quality, an increase in frame latency, and video stuttering. Therefore, protection strategies (such as reverting to manually - designed heuristic algorithms) are usually required to ensure system stability, but this will interrupt the learning process and affect the learning efficiency. 2. **Negative impacts of protection strategies**: Existing RL - based CC solutions rely on safeguard policies during the training process. These policies will intervene when the RL model takes risky actions to prevent the system from entering an unsafe state. However, this approach limits the exploration of the action space and the observation space by the RL model, thereby reducing the sampling efficiency and the learning efficiency. 3. **Utilizing loss - tolerant neural video codecs (NVC)**: Recently emerged loss - tolerant neural video codecs (NVCs) have strong packet - loss tolerance capabilities and can still decode high - quality video frames in the case of partial data loss. However, this feature has not been fully utilized in the design of RL - based CC. ### Solutions proposed in the paper To solve the above problems, this paper proposes a new RL - based RTC congestion control algorithm - NVC - CC, which improves the existing methods in the following ways: 1. **Utilizing the loss - tolerance characteristics of NVC**: NVC - CC enables the RL model to be aware of and utilize the loss - tolerance characteristics of NVC through the design of the reward function. This not only reduces the frequency of using protection strategies but also improves the learning efficiency and the final quality of user experience. 2. **Improving learning efficiency**: Experimental results show that when NVC - CC is used in combination with loss - tolerant NVC, compared with other RL - based CC, the training time is reduced by 41%, the average video quality is increased by 0.3 to 1.6 dB, the tail - frame latency is reduced by 3 to 200 milliseconds, and video stuttering is reduced by 20% to 77%. 3. **Optimizing the reward function**: The reward function of NVC - CC is directly based on frame quality and packet latency, rather than indirectly depending on network - level performance indicators. This can more accurately reflect the actual quality of user experience and avoid the problem that traditional video codecs cannot decode in the case of packet loss. ### Summary By introducing loss - tolerant neural video codecs, this paper solves the trade - off problem between the learning efficiency and the quality of user experience in online reinforcement learning in real - time video communication, and provides a more efficient and stable congestion control solution.

Loss-tolerant neural video codec aware congestion control for real time video communication

TRCC: Transferable Congestion Control with Reinforcement Learning

A Multi-objective Reinforcement Learning Perspective on Internet Congestion Control

Learning-Based Low-Latency VIoT Video Streaming Against Jamming and Interference

Rate-Quality Based Rate Control Model for Neural Video Compression.

Λ-Domain VVC Rate Control Based on Nash Equilibrium

GRACE: Loss-Resilient Real-Time Video through Neural Codecs

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming Based on Deep Reinforcement Learning.

Learning-based Congestion Control for Internet Video Communication over Wireless Networks

Deep-Reinforcement-Learning-based User-Preference-Aware Rate Adaptation for Video Streaming

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning

Delay-Constrained Rate Control for Real-Time Video Streaming with Bounded Neural Network

Intelligent Video Ingestion for Real-time Traffic Monitoring

Quality-Driven Adaptive Video Streaming for Cognitive VANETs

Neural Adaptive Transport Framework for Internet-scale Interactive Media Streaming Services

Receiver Sender Packet Transmission Packet Reception Video Decoder Video Quality Prediction Network Video Quality Reinforcement Learning Video Streaming Session Noise Filter Video Encoder Present Video Frames Video Quality Predicted Bitrate Selection QARC Future Time Feedback Message

Reinforcement learning for bandwidth estimation and congestion control in real-time communications

Neural Video Compression with Feature Modulation

PACC: Perception Aware Congestion Control for Real-time Communication

Efficient Rate Control in Versatile Video Coding with Adaptive Spatial-Temporal Bit Allocation and Parameter Updating

Real-time rate control of WebRTC video streams in 5G networks: Improving quality of experience with Deep Reinforcement Learning