Abstract:The Intelligent Transportation System (ITS) environment is known to be dynamic and distributed, where participants (vehicle users, operators, etc.) have multiple, changing and possibly conflicting objectives. Although Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS applications such as resource management and offloading, most RL algorithms focus on single objectives. In many situations, converting a multi-objective problem into a single-objective one is impossible, intractable or insufficient, making such RL algorithms inapplicable. We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements, which automatically triggers adaptive few-shot learning in a dynamic, distributed and noisy environment with sparse and delayed reward. We test our algorithm in an ITS environment with edge cloud computing. Empirical results show that the algorithm is quick to adapt to new environments and performs better in all individual and system metrics compared to the state-of-the-art benchmark. Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method. In addition to the cloud simulation, we test our algorithm on a single-board computer and show that it can make inference in 6 milliseconds.
Machine Learning,Artificial Intelligence,Multiagent Systems,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve the problem of multi - objective optimization in intelligent transportation systems (ITS), especially in distributed, non - stationary and adversarial environments. Specifically, the paper mainly addresses the following aspects:
1. **Complexity of multi - objective problems**: In intelligent transportation systems, participants (such as vehicle users, operators, etc.) usually have multiple, changing and potentially conflicting goals. Most existing reinforcement learning (RL) algorithms can only handle single - objective problems, and simplifying multi - objective problems into single - objective problems is usually infeasible, intractable or insufficient.
2. **Adaptability in dynamic environments**: The environment of intelligent transportation systems is dynamically changing, with distribution and noise characteristics, and the reward signal is sparse and delayed. Existing RL algorithms perform poorly in such environments, especially under frequently changing combined goals and preferences.
3. **Computational efficiency and resource utilization**: Existing multi - objective RL algorithms often require high computational costs and are difficult to achieve efficient online retraining in practical applications. Therefore, it is very necessary to design an efficient, low - computational - requirement multi - objective multi - agent reinforcement learning (MARL) algorithm.
### Main contributions of the paper
1. **Propose for the first time a multi - objective MARL algorithm suitable for distributed, non - stationary environments**: This algorithm can optimize in frequently changing combinations of goals and preferences.
2. **Efficient online retraining**: By offline training an initially optimal model and then deploying it to each independent agent (representing vehicle users), these agents can update their offloading strategies through online few - shot learning without prior knowledge of the reward shape, reducing the retraining cost.
3. **Modular and asynchronous training**: The algorithm can be modularized and trained asynchronously, improving flexibility and scalability. Experiments show that this algorithm outperforms existing benchmark algorithms in all individual and system indicators, and can also improve the underlying resource efficiency in heterogeneous environments, making other algorithms benefit from the improved offloading rate and fairness.
4. **Real - time inference performance**: Tests on single - board computers show that this algorithm can complete inference within 6 milliseconds, meeting the real - time requirements.
5. **Publish code and data**: To promote research and application, the authors provide publicly accessible code and data.
### Summary
This paper aims to solve the challenges of multi - objective optimization in intelligent transportation systems, especially in dynamic, distributed and non - stationary environments. By proposing an efficient multi - objective multi - agent reinforcement learning algorithm, this research not only improves the performance of resource allocation and offloading decisions, but also provides a feasible solution for practical applications.