Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Vasileios Karampinis,Anastasios Arsenos,Orfeas Filippopoulos,Evangelos Petrongonas,Christos Skliros,Dimitrios Kollias,Stefanos Kollias,Athanasios Voulodimos
2024-05-16
Abstract:In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve the detection, tracking, and distance estimation of non - cooperative aircraft through visual sensors in unmanned aerial vehicle (UAVs) operations to ensure flight safety, especially to avoid mid - air collisions (MAC) and near - mid - air collisions (NMAC). Specifically, the paper proposes a deep - learning - based framework that uses monocular camera input to estimate the distance information of detected aerial objects in real - time, thereby achieving the perception of non - cooperative aircraft and obstacle avoidance. This problem is particularly important in low - altitude vehicle operations because traditional radar and other sensing technologies are difficult to deploy on small drones due to size, weight, and power consumption (SWaP) limitations. Therefore, this research aims to develop an efficient, lightweight solution that can operate in real - time on resource - constrained platforms. The key contributions of the paper include: 1. **Dataset Construction**: A large - scale depth - estimation dataset was constructed using the distance information (GPS) in the AOT dataset to train an encoder - decoder deep neural network for depth estimation. 2. **Loss Function Design**: A hybrid loss function was designed to train the above - mentioned depth - estimation model, combining edge loss, structural similarity index (SSIM), L1 loss, and BerHu loss to improve the robustness and accuracy of the model. 3. **System Integration and Evaluation**: The depth - estimation model was integrated into the detection and tracking pipeline, and its performance was evaluated on a large - scale aerial target tracking (AOT) dataset, showing a significant improvement in precision. Through these contributions, the paper provides an innovative method to solve the problem of aerial obstacle detection and avoidance faced by UAVs during mission execution, especially in the absence of cooperative signal sources, relying solely on visual sensors to achieve high - precision real - time obstacle avoidance.