CLIPSwarm: Converting text into formations of robots

Pablo Pueyo,Eduardo Montijano,Ana C. Murillo,Mac Schwager

2023-11-18

Abstract:We present CLIPSwarm, an algorithm to generate robot swarm formations from natural language descriptions. CLIPSwarm receives an input text and finds the position of the robots to form a shape that corresponds to the given text. To do so, we implement a variation of the Montecarlo particle filter to obtain a matching formation iteratively. In every iteration, we generate a set of new formations and evaluate their Clip Similarity with the given text, selecting the best formations according to this metric. This metric is obtained using Clip, [1], an existing foundation model trained to encode images and texts into vectors within a common latent space. The comparison between these vectors determines how likely the given text describes the shapes. Our initial proof of concept shows the potential of this solution to generate robot swarm formations just from natural language descriptions and demonstrates a novel application of foundation models, such as CLIP, in the field of multi-robot systems. In this first approach, we create formations using a Convex-Hull approach. Next steps include more robust and generic representation and optimization steps in the process of obtaining a suitable swarm formation.

Robotics

What problem does this paper attempt to address?

The paper proposes a new algorithm called CLIPSwarm, which aims to automatically generate formations for robot swarms (especially drone swarms) based on natural language descriptions. Specifically, CLIPSwarm takes a text input and then generates a set of robot positions such that the pattern formed by these robots matches the input text description. The main contributions of the paper include: 1. **Algorithm Introduction**: The CLIPSwarm algorithm is based on a variant of the Monte Carlo particle filter to iteratively generate different robot formations and uses the CLIP model to evaluate the similarity between these formations and the given text. CLIP is a pre-trained foundational model that can encode images and text into the same vector space, allowing for the calculation of their similarity. 2. **Experimental Validation**: The authors demonstrate the robot formations generated by CLIPSwarm under different natural language descriptions and showcase these formations through simulations in the high-fidelity drone simulator AirSim, proving the algorithm's effectiveness and potential practical application value. 3. **Discussion of Limitations**: The paper also mentions some limitations of the current method, such as the use of convex hull contours to simplify the evaluation process, which may lead to the loss of shape details, and the sole reliance on CLIP similarity, which may not fully meet the user's expected shapes. In summary, CLIPSwarm provides a novel approach for automatically creating robot swarm formations from natural language descriptions, opening new directions for research in multi-robot systems, particularly in the field of artistic robots. Future work will include improving the algorithm to handle more complex inputs and using more diverse metrics to enhance the accuracy and expressiveness of the formations.

CLIPSwarm: Converting text into formations of robots

CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models

Distributed Swarm Trajectory Optimization for Formation Flight in Dense Environments

Morphological computation and decentralized learning in a swarm of sterically interacting robots

V-Shaped Formation Control for Robotic Swarms Constrained by Field of View

Swarm-Enabling Technology for Multi-Robot Systems

A Swarm of Simple Robots Constructing Planar Shapes

SwarmLab: a Matlab Drone Swarm Simulator

Mean-shift exploration in shape assembly of robot swarms

Continuous Sculpting: Persistent Swarm Shape Formation Adaptable to Local Environmental Changes

Swarm Robot Pattern Formation Using a Morphogenetic Multi-Cellular Based Self-Organizing Algorithm

A Multi-robot Pattern Formation Algorithm Based on Distributed Swarm Intelligence

Hearing the shape of an arena with spectral swarm robotics

Line and V-Shape Formation Based Distributed Processing for Robotic Swarms.

Behavior-Based Formation Control of Swarm Robots

A Morphogenetic Approach to Flexible and Robust Shape Formation for Swarm Robotic Systems

Bioinspired cooperation in a heterogeneous robot swarm using ferrofluid artificial pheromones for uncontrolled environments

Efficient Concurrent Design of the Morphology of Unmanned Aerial Systems and their Collective-Search Behavior

Behavioral-based circular formation control for robot swarms

Formations organization in robotic swarm using the thermal motion equivalent method

Swarm Robots Inspired by Friendship Formation Process