Self-Supervised Path Planning in UAV-aided Wireless Networks based on Active Inference

Ali Krayani,Khalid Khan,Lucio Marcenaro,Mario Marchese,Carlo Regazzoni
2024-03-06
Abstract:This paper presents a novel self-supervised path-planning method for UAV-aided networks. First, we employed an optimizer to solve training examples offline and then used the resulting solutions as demonstrations from which the UAV can learn the world model to understand the environment and implicitly discover the optimizer's policy. UAV equipped with the world model can make real-time autonomous decisions and engage in online planning using active inference. During planning, UAV can score different policies based on the expected surprise, allowing it to choose among alternative futures. Additionally, UAV can anticipate the outcomes of its actions using the world model and assess the expected surprise in a self-supervised manner. Our method enables quicker adaptation to new situations and better performance than traditional RL, leading to broader generalizability.
Robotics,Machine Learning,Signal Processing
What problem does this paper attempt to address?
This paper proposes a novel self-supervised path planning method for UAV-assisted wireless networks. It is based on active inference theory and aims to address the problem of autonomous navigation of UAVs in unknown environments. First, a world model is learned through offline optimization to solve training examples, enabling the UAV to understand the environment and infer optimal strategies. Then, during real-time decision-making, the learned world model is utilized for online planning, evaluating different strategies based on expected surprise to select the best action path. Traditional methods rely on precise system information, while the new method can adapt to new situations more quickly and has better generalization performance compared to traditional reinforcement learning (RL) methods. In various test scenarios, this method provides faster, more stable, and reliable solutions compared to the modified Q-learning method. Specifically, the UAV self-supervises the environment by predicting the outcomes of its actions and the expected surprise, enabling it to make decisions. This approach allows the UAV to complete tasks in the shortest time while maximizing total rate and minimizing completion time, similar to the variant of the traveling salesman problem (TSP) known as the TSP with profits (TSPWP). The main contributions of the paper include introducing "expected surprise" as a planning score criterion and demonstrating the application of online planning in decision-making, enhancing the adaptability and generality of UAVs. Experimental results show that this method outperforms the modified Q-learning algorithm in path planning and task completion time.