Loss-tolerant neural video codec aware congestion control for real time video communication

Zhengxu Xia,Hanchen Li,Junchen Jiang
2024-11-11
Abstract:Because of reinforcement learning's (RL) ability to automatically create more adaptive controlling logics beyond the hand-crafted heuristics, numerous effort has been made to apply RL to congestion control (CC) design for real time video communication (RTC) applications and has successfully shown promising benefits over the rule-based RTC CCs. Online reinforcement learning is often adopted to train the RL models so the models can directly adapt to real network environments. However, its trail-and-error manner can also cause catastrophic degradation of the quality of experience (QoE) of RTC application at run time. Thus, safeguard strategies such as falling back to hand-crafted heuristics can be used to run along with RL models to guarantee the actions explored in the training sensible, despite that these safeguard strategies interrupt the learning process and make it more challenging to discover optimal RL policies. The recent emergence of loss-tolerant neural video codecs (NVC) naturally provides a layer of protection for the online learning of RL-based congestion control because of its resilience to packet losses, but such packet loss resilience have not been fully exploited in prior works yet. In this paper, we present a reinforcement learning (RL) based congestion control which can be aware of and takes advantage of packet loss tolerance characteristic of NVCs via reward in online RL learning. Through extensive evaluation on various videos and network traces in a simulated environment, we demonstrate that our NVC-aware CC running with the loss-tolerant NVC reduces the training time by 41\% compared to other prior RL-based CCs. It also boosts the mean video quality by 0.3 to 1.6dB, lower the tail frame delay by 3 to 200ms, and reduces the video stalls by 20\% to 77\% in comparison with other baseline RTC CCs.
Networking and Internet Architecture,Multimedia
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the congestion control (CC) problem in real - time video communication (RTC), especially how to improve the learning efficiency of the congestion control algorithm based on reinforcement learning (RL) and the quality of user experience (QoE). Specifically: 1. **Risks of online reinforcement learning**: Although online reinforcement learning enables the model to directly adapt to the real - network environment, its trial - and - error mechanism may lead to a decline in system performance, such as a decrease in video quality, an increase in frame latency, and video stuttering. Therefore, protection strategies (such as reverting to manually - designed heuristic algorithms) are usually required to ensure system stability, but this will interrupt the learning process and affect the learning efficiency. 2. **Negative impacts of protection strategies**: Existing RL - based CC solutions rely on safeguard policies during the training process. These policies will intervene when the RL model takes risky actions to prevent the system from entering an unsafe state. However, this approach limits the exploration of the action space and the observation space by the RL model, thereby reducing the sampling efficiency and the learning efficiency. 3. **Utilizing loss - tolerant neural video codecs (NVC)**: Recently emerged loss - tolerant neural video codecs (NVCs) have strong packet - loss tolerance capabilities and can still decode high - quality video frames in the case of partial data loss. However, this feature has not been fully utilized in the design of RL - based CC. ### Solutions proposed in the paper To solve the above problems, this paper proposes a new RL - based RTC congestion control algorithm - NVC - CC, which improves the existing methods in the following ways: 1. **Utilizing the loss - tolerance characteristics of NVC**: NVC - CC enables the RL model to be aware of and utilize the loss - tolerance characteristics of NVC through the design of the reward function. This not only reduces the frequency of using protection strategies but also improves the learning efficiency and the final quality of user experience. 2. **Improving learning efficiency**: Experimental results show that when NVC - CC is used in combination with loss - tolerant NVC, compared with other RL - based CC, the training time is reduced by 41%, the average video quality is increased by 0.3 to 1.6 dB, the tail - frame latency is reduced by 3 to 200 milliseconds, and video stuttering is reduced by 20% to 77%. 3. **Optimizing the reward function**: The reward function of NVC - CC is directly based on frame quality and packet latency, rather than indirectly depending on network - level performance indicators. This can more accurately reflect the actual quality of user experience and avoid the problem that traditional video codecs cannot decode in the case of packet loss. ### Summary By introducing loss - tolerant neural video codecs, this paper solves the trade - off problem between the learning efficiency and the quality of user experience in online reinforcement learning in real - time video communication, and provides a more efficient and stable congestion control solution.