Abstract:This paper presents a data-driven approach to learning vision-based collective behavior from a simple flocking algorithm. We simulate a swarm of quadrotor drones and formulate the controller as a regression problem in which we generate 3D velocity commands directly from raw camera images. The dataset is created by simultaneously acquiring omnidirectional images and computing the corresponding control command from the flocking algorithm. We show that a convolutional neural network trained on the visual inputs of the drone can learn not only robust collision avoidance but also coherence of the flock in a sample-efficient manner. The neural controller effectively learns to localize other agents in the visual input, which we show by visualizing the regions with the most influence on the motion of an agent. This weakly supervised saliency map can be computed efficiently and may be used as a prior for subsequent detection and relative localization of other agents. We remove the dependence on sharing positions among flock members by taking only local visual information into account for control. Our work can therefore be seen as the first step towards a fully decentralized, vision-based flock without the need for communication or visual markers to aid detection of other agents.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to develop a fully decentralized, vision - based drone swarm control method to achieve robust collision avoidance and group - consistent flight without the need to share position information or use visual markers. Specifically, the research aims to solve the problem in the following ways: 1. **Decentralized control**: Existing multi - agent robotic systems usually rely on centralized control or wireless communication to share position information (such as through motion capture systems or global navigation satellite systems). These methods are at risk of single - point failure and may cause reliability problems in dense urban environments due to signal reflection and wireless frequency band overload. The method proposed in this paper only relies on local visual information for control, thereby improving the autonomy and robustness of the system. 2. **Vision - based control**: Most existing drone control systems rely on accurate position information, while this paper proposes a learning method based on visual input, enabling drones to recognize other drones through image recognition and perform corresponding collision - avoidance and group - consistency behaviors. This not only reduces the dependence on external positioning systems but also improves the flexibility and adaptability of the system. 3. **End - to - end learning**: This paper models the behavior of drone swarms as a regression problem, directly predicting 3D velocity commands from the original images. Through convolutional neural network (CNN) training, the model can learn how to perform collision avoidance and maintain group consistency according to visual input. This method avoids the complex feature engineering and manually - designed controller steps in traditional methods and realizes end - to - end learning. ### Formula summary - **Neighbor definition**: \[ N_i=\{agents\ j:j\neq i\land\|r_{ij}\|<r_{max}\} \] where \(r_{ij}\in\mathbb{R}^3\) represents the relative position of agent \(j\) with respect to agent \(i\), and \(\|\cdot\|\) represents the Euclidean norm. - **Separation velocity command**: \[ v_{sep,i}=-k_{sep}\frac{\sum_{j\in N_i}r_{ij}}{\|r_{ij}\|^2} \] where \(k_{sep}\) is the separation gain. - **Cohesion velocity command**: \[ v_{coh,i}=k_{coh}\frac{\sum_{j\in N_i}r_{ij}}{|N_i|} \] where \(k_{coh}\) is the cohesion gain. - **Migration velocity command**: \[ v_{mig,i}=k_{mig}\frac{r_{mig,i}}{\|r_{mig,i}\|} \] where \(k_{mig}\) is the migration gain, and \(r_{mig,i}\in\mathbb{R}^3\) represents the relative position of the migration point with respect to agent \(i\). - **Final velocity command**: \[ v_i = \frac{\tilde{v}_i}{\|\tilde{v}_i\|}\min(\|\tilde{v}_i\|,v_{max}) \] where \(\tilde{v}_i=v_{rey,i}+v_{mig,i}\), \(v_{rey,i}=v_{sep,i}+v_{coh,i}\), and \(v_{max}\) is the desired maximum speed. Through the above methods, this paper successfully demonstrates how to rely solely on visual input.

Learning Vision-based Cohesive Flight in Drone Swarms

Learning Vision-Based Flight in Drone Swarms by Imitation

Vision-based Drone Flocking in Outdoor Environments

Cooperative Flocking And Learning In Multi-Robot Systems For Predator Avoidance

Learning-Based Multi-UAV Flocking Control With Limited Visual Field and Instinctive Repulsion

VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms

Agile Formation Control of Drone Flocking Enhanced With Active Vision-Based Relative Localization

Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning

Learning to Navigate in Turbulent Flows with Aerial Robot Swarms: A Cooperative Deep Reinforcement Learning Approach

Neural-Swarm2: Planning and Control of Heterogeneous Multirotor Swarms Using Learned Interactions

An Interrelated Imitation Learning Method for Heterogeneous Drone Swarm Coordination

Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions

Distributed Deep Reinforcement Learning for Drone Swarm Control

Nearest-Neighbor-based Collision Avoidance for Quadrotors via Reinforcement Learning

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

Vision and Learning for Deliberative Monocular Cluttered Flight

Vision-based Learning for Drones: A Survey

Learning Agile, Vision-based Drone Flight: from Simulation to Reality

Federated Imitation Learning for UAV Swarm Coordination in Urban Traffic Monitoring

Learning to Swarm with Knowledge-Based Neural Ordinary Differential Equations