Cytometry Masked Autoencoder: An Accurate and Interpretable Automated Immunophenotyper
Jaesik Kim,Matei Ionita,Matthew Lee,Michelle L. McKeague,Ajinkya Pattekar,Mark M. Painter,Joost Wagenaar,Van Truong,Dylan T. Norton,Divij Mathew,Yonghyun Nam,Sokratis A. Apostolidis,Cynthia Clendenin,Patryk Orzechowski,Sang-Hyuk Jung,Jakob Woerner,Caroline A.G. Ittner,Alexandra P. Turner,Mika Esperanza,Thomas G. Dunn,Nilam S. Mangalmurti,John P. Reilly,Nuala J. Meyer,Carolyn S. Calfee,Kathleen D. Liu,Michael A. Matthy,Lamorna Brown Swigart,Ellen L. Burnham,Jeffrey McKeehan,Sheetal Gandotra,Derek W. Russel,Kevin W. Gibbs,Karl W. Thomas,Harsh Barot,Allison R. Greenplate,E. John Wherry,Dokyoon Kim
DOI: https://doi.org/10.1101/2024.02.13.580114
2024-02-27
Abstract:High-throughput single-cell cytometry data are crucial for understanding involvement of immune system in diseases and responses to treatment. Traditional methods for annotating cytometry data, specifically manual gating and clustering, face challenges in scalability, robustness, and accuracy. In this study, we propose a cytometry masked autoencoder (cyMAE), which offers an automated solution for immunophenotyping tasks including cell type annotation. The cyMAE model is designed to uphold user-defined cell type definitions, thereby facilitating easier interpretation and cross-study comparisons. The cyMAE model operates on a pre-train and fine-tune approach. In the pre-training phase, cyMAE employs to learn relationships between protein markers in immune cells solely based on protein expression, without relying on prior information such as cell identity and cell type-specific marker proteins. Subsequently, the pre-trained cyMAE is fine-tuned on multiple specialized tasks via task-specific supervised learning. The pre-trained cyMAE addresses the shortcomings of manual gating and clustering methods by providing accurate and interpretable predictions. Through validation across multiple cohorts, we demonstrate that cyMAE effectively identifies co-occurrence patterns of bound labeled antibodies, delivers accurate and interpretable cellular immunophenotyping, and improves the prediction of subject metadata status. Specifically, we evaluated cyMAE for cell type annotation and imputation at the cellular-level and SARS-CoV-2 infection prediction, secondary immune response prediction against COVID-19, and prediction of the infection stage in COVID-19 progression at the subject-level. The introduction of cyMAE marks a significant step forward in immunology research, particularly in large-scale and high-throughput human immune profiling. This approach offers new possibilities for predicting and interpreting cellular-level and subject-level phenotypes in both health and disease.
Bioinformatics