CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models

Pablo Pueyo,Eduardo Montijano,Ana C. Murillo,Mac Schwager

2024-03-20

Abstract:This paper introduces CLIPSwarm, a new algorithm designed to automate the modeling of swarm drone formations based on natural language. The algorithm begins by enriching a provided word, to compose a text prompt that serves as input to an iterative approach to find the formation that best matches the provided word. The algorithm iteratively refines formations of robots to align with the textual description, employing different steps for "exploration" and "exploitation". Our framework is currently evaluated on simple formation targets, limited to contour shapes. A formation is visually represented through alpha-shape contours and the most representative color is automatically found for the input word. To measure the similarity between the description and the visual representation of the formation, we use CLIP [1], encoding text and images into vectors and assessing their similarity. Subsequently, the algorithm rearranges the formation to visually represent the word more effectively, within the given constraints of available drones. Control actions are then assigned to the drones, ensuring robotic behavior and collision-free movement. Experimental results demonstrate the system's efficacy in accurately modeling robot formations from natural language descriptions. The algorithm's versatility is showcased through the execution of drone shows in photorealistic simulation with varying shapes. We refer the reader to the supplementary video for a visual reference of the results.

Robotics,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper proposes a new algorithm called CLIPSwarm, aimed at automatically designing drone swarm formations through natural language descriptions. Specifically, the algorithm addresses the following issues: 1. **Automated Drone Formation Design**: Utilizing natural language processing technology, users can simply input a word to describe the desired pattern, such as "leaf," and the algorithm will automatically determine the optimal positions and colors of the drones to match the shape described by the word. 2. **Using the CLIP Model for Similarity Assessment**: Researchers used the CLIP model to measure the similarity between the text description and the visual representation formed by the drone swarm. The CLIP model can encode text and images into vectors and evaluate the similarity between these vectors. 3. **Iterative Optimization of Formation Shapes**: The algorithm iteratively optimizes the positions of the drones to improve the match between the shape and the text description. This process includes exploring new shapes and further improving existing high-matching shapes. 4. **Achieving Artistic Drone Performances**: The ultimate goal is to enable non-professional users to easily create artistic drone performances without the need to pre-design complex formation patterns. 5. **Robot Behavior and Collision Avoidance**: In addition to shape matching, the algorithm also considers actual robot behavior, ensuring that drones avoid collisions during movement and can safely reach designated positions. In short, the goal of the CLIPSwarm algorithm is to automatically generate drone formation shapes that match the input words through natural language input and ensure that these drones can perform the corresponding tasks in a real environment.

CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models

CLIPSwarm: Converting text into formations of robots

Distributed Swarm Trajectory Optimization for Formation Flight in Dense Environments

Vision-based Drone Flocking in Outdoor Environments

Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Learning Vision-Based Flight in Drone Swarms by Imitation

SwarmLab: a Matlab Drone Swarm Simulator

V-Shaped Formation Control for Robotic Swarms Constrained by Field of View

An Effective and Scalable Approach for Swarm-on-Swarm Air Combat Decision

Learning Vision-based Cohesive Flight in Drone Swarms

Agile Formation Control of Drone Flocking Enhanced With Active Vision-Based Relative Localization

Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multistage Reinforcement Learning Approach

SwarmGPT-Primitive: A Language-Driven Choreographer for Drone Swarms Using Safe Motion Primitive Composition

Swarm-GPT: Combining Large Language Models with Safe Motion Planning for Robot Choreography Design

Drone swarm patrolling with uneven coverage requirements

FACT: Fast and Active Coordinate Initialization for Vision-based Drone Swarms

Compact and ordered swarms of unmanned aerial vehicles in cluttered environments

Swarm coordination of mini-UAVs for target search using imperfect sensors

Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach

Efficient Concurrent Design of the Morphology of Unmanned Aerial Systems and their Collective-Search Behavior

VG-Swarm: A Vision-based Gene Regulation Network for UAVs Swarm Behavior Emergence