Abstract:Language models are thriving, powering conversational agents that assist and empower humans to solve a number of tasks. Recently, these models were extended to support additional modalities including vision, audio and video, demonstrating impressive capabilities across multiple domains including healthcare. Still, conversational agents remain limited in biology as they cannot yet fully comprehend biological sequences. On the other hand, high-performance foundation models for biological sequences have been built through self-supervision over sequencing data, but these need to be fine-tuned for each specific application, preventing transfer and generalization between tasks. In addition, these models are not conversational which limits their utility to users with coding capabilities. In this paper, we propose to bridge the gap between biology foundation models and conversational agents by introducing ChatNT, the first multimodal conversational agent with an advanced understanding of biological sequences. ChatNT achieves new state-of-the-art results on the Nucleotide Transformer benchmark while being able to solve all tasks at once, in English, and to generalize to unseen questions. In addition, we have curated a new set of more biologically relevant instructions tasks from DNA, RNA and proteins, spanning multiple species, tissues and biological processes. ChatNT reaches performance on par with state-of-the-art specialized methods on those tasks. We also present a novel perplexity-based technique to help calibrate the confidence of our model predictions. Our framework for genomics instruction-tuning can be easily extended to more tasks and biological data modalities (e.g. structure, imaging), making it a widely applicable tool for biology. ChatNT is the first model of its kind and constitutes an initial step towards building generally capable agents that understand biology from first principles while being accessible to users with no coding background.

Omega — harnessing the power of large language models for bioimage analysis

OmniParser for Pure Vision Based GUI Agent

OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents

ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

AmadeusGPT: a natural language interface for interactive animal behavioral analysis

BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging

NExT-Chat: An LMM for Chat, Detection and Segmentation

LAMBDA: A Large Model Based Data Agent

Omega: The Power of Visual Simplicity

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Data science through natural language with ChatGPT's Code Interpreter

Next‐generation human‐robot interaction with ChatGPT and robot operating system

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

MEGAnno: Exploratory Labeling for NLP in Computational Notebooks

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day