A generative foundation model for antibody sequence understanding

Justin Barton,Aretas Gaspariunas,David A Yadin,Jorge Dias,Francesca L Nice,Danielle H Minns,Olivia Snudden,Chelsea Povall,Sara Valle Tomas,Harry Dobson,James HR Farmery,Jinwoo Leem,Jacob D Galson
DOI: https://doi.org/10.1101/2024.05.22.594943
2024-07-24
Abstract:Here we introduce FAbCon, a generative antibody-specific language model comprising 2.4 billion parameters. A commonly accepted wisdom in developing large language models is that increasing model scale will translate to higher performance on downstream tasks. Starting from a 144-million parameter setup, we show that progressively larger models achieve greater accuracy in predicting antigen binding and can also be used to design new antibodies with good predicted developability potential. FAbCon is available on huggingface.co/alchemab.
Bioinformatics
What problem does this paper attempt to address?