Abstract:Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as human cells. Artificial intelligence (AI) enabled design provides a powerful alternative with potential to bypass evolutionary constraints and generate editors with optimal properties. Here, using large language models (LLMs) trained on biological diversity at scale, we demonstrate the first successful precision editing of the human genome with a programmable gene editor designed with AI. To achieve this goal, we curated a dataset of over one million CRISPR operons through systematic mining of 26 terabases of assembled genomes and meta-genomes. We demonstrate the capacity of our models by generating 4.8x the number of protein clusters across CRISPR-Cas families found in nature and tailoring single-guide RNA sequences for Cas9-like effector proteins. Several of the generated gene editors show comparable or improved activity and specificity relative to SpCas9, the prototypical gene editing effector, while being 400 mutations away in sequence. Finally, we demonstrate an AI-generated gene editor, denoted as OpenCRISPR-1, exhibits compatibility with base editing. We release OpenCRISPR-1 publicly to facilitate broad, ethical usage across research and commercial applications.

What problem does this paper attempt to address?

The problem this paper attempts to address is: how to utilize artificial intelligence (especially large language models) to design powerful gene editors to overcome the functional deficiencies of existing CRISPR-Cas systems when applied in non-native environments (such as human cells). Specifically, the goals of the paper include: 1. **Generating diverse CRISPR-Cas proteins**: By using large-scale data mining and machine learning methods, generate a large number of novel CRISPR-Cas proteins that are significantly different in sequence from natural proteins but still functional. 2. **Improving the activity and specificity of gene editors**: Design gene editors that exhibit activity and specificity in human cells comparable to or better than SpCas9, while also being compatible with other functions such as base editing. 3. **Validating the functionality of generated gene editors**: Experimentally validate the actual editing effects of the generated gene editors in human cells to ensure their efficiency and specificity at different targets. The paper achieves these goals through the following steps: 1. **Data collection and preprocessing**: Mining over 1 million CRISPR-Cas operons from 26 terabytes of assembled genomes and metagenomes, constructing a CRISPR-Cas map. 2. **Model training and generation**: Using large language models (LLMs) to train on the CRISPR-Cas map and generate 4 million CRISPR-Cas protein sequences. 3. **Sequence classification and screening**: Classifying and screening the generated sequences using BLAST and HMM to ensure the generated sequences belong to specific CRISPR-Cas families. 4. **Structure prediction and functional validation**: Using AlphaFold2 to predict the structure of the generated proteins and experimentally validating their editing efficiency and specificity in human cells. Ultimately, the paper demonstrates that the generated gene editor OpenCRISPR-1 exhibits high activity and specificity in human cells, providing new possibilities for the development of gene editing technology.

Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences

SgRNA Engineering for Improved Genome Editing and Expanded Functional Assays.

CRISPR-Cas12a System With Synergistic Phage Recombination Proteins for Multiplex Precision Editing in Human Cells

Expanding Genome Editing Scopes with Artificial Intelligence.

A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation

Experimental results of second-harmonic generation from a polyurethane waveguide on a silver grating coupler.

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

The pathology of Plasmodium falciparum in owl monkeys.

CRISPR beyond: harnessing compact RNA-guided endonucleases for enhanced genome editing

CRISPER/CAS: A potential tool for genomes editing

CRISPR-Cas systems for editing, regulating and targeting genomes

Advancing genome editing with artificial intelligence: opportunities, challenges, and future directions

CRISPR technologies for genome, epigenome and transcriptome editing

To Cut or Not to Cut: Next-generation Genome Editors for Precision Genome Engineering

Prime editing for precise and highly versatile genome manipulation

Enhanced guide-RNA design and targeting analysis for precise CRISPR genome editing of single and consortia of industrially relevant and non-model organisms

New advances in CRISPR/Cas-mediated precise gene-editing techniques

RNA-Guided Human Genome Engineering via Cas9

Dissecting the mechanism of CRISPR–Cas technologies to design efficient biotechnologies

The application of CRISPR /Cas mediated gene editing in synthetic biology: Challenges and optimizations