Gang Li,Eva K. Nichols,Valentino E. Browning,Nicolas J. Longhi,Conor Camplisson,Brian J. Beliveau,William Stafford Noble
Abstract:The cell cycle governs the proliferation, differentiation, and regeneration of all eukaryotic cells. Profiling cell cycle dynamics is therefore central to basic and biomedical research spanning development, health, aging, and disease. However, current approaches to cell cycle profiling involve complex interventions that may confound experimental interpretation. To facilitate more efficient cell cycle annotation of microscopy data, we developed CellCycleNet, a machine learning (ML) workflow designed to simplify cell cycle staging with minimal experimenter intervention and cost. CellCycleNet accurately predicts cell cycle phase using only a fluorescent nuclear stain (DAPI) in fixed interphase cells. Using the Fucci2a cell cycle reporter system as ground truth, we collected two benchmarking image datasets and trained two ML models--a support vector machine (SVM) and a deep neural network--to classify nuclei as being in either the G1 or S/G2 phases of the cell cycle. Our results suggest that CellCycleNet outperforms state-of-the-art SVM models on each dataset individually. When trained on two image datasets simultaneously, CellCycleNet achieves the highest classification accuracy, with an improvement in AUROC of 0.08-0.09. The model also demonstrates excellent generalization across different microscopes, achieving an AUROC of 0.95. Overall, using features derived from 3D images, rather than 2D projections of those same images, significantly improves classification performance. We have released our image data, trained models, and software as a community resource.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the cell - cycle stage from 3D single - cell nuclear staining images. Specifically, the authors developed a machine - learning workflow named CellCycleNet, aiming to simplify the cell - cycle staging of fixed interphase cells by using only fluorescent nuclear staining (such as DAPI), thereby reducing the intervention and cost of experimenters.
### Background
The cell cycle regulates the proliferation, differentiation, and regeneration processes of all eukaryotic cells. Therefore, studying cell - cycle dynamics is crucial for basic and biomedical research on development, health, aging, and disease. However, current cell - cycle analysis methods usually require complex interventions, which may affect the interpretation of experimental results. For example, methods such as metabolic labeling (such as BrdU), genetic engineering (such as knocking in fluorescent - labeled proteins), DNA staining for flow cytometry, immunofluorescence detection of proliferation antigens, or live - cell imaging are effective but complex to operate and costly.
### Solution
To simplify cell - cycle staging, the authors proposed CellCycleNet, a machine - learning - based workflow. CellCycleNet achieves its goals through the following steps:
1. **Data collection**: Using the Fucci2a cell - cycle reporting system as the ground - truth label, two benchmark image datasets were collected.
2. **Model training**: Two machine - learning models - support vector machines (SVM) and deep neural networks (DNN) - were trained to classify whether the cell nucleus was in the G1 or S/G2 phase.
3. **Performance evaluation**: The performance of CellCycleNet was tested on the two datasets respectively and compared with the existing SVM models.
### Main findings
- **Advantages of 3D images**: Using 3D image features instead of 2D projection images significantly improved the classification performance. CellCycleNet performed better on 3D images than on 2D images, with an AUROC increase of 0.08 - 0.09.
- **Cross - platform generalization ability**: CellCycleNet performed excellently on different microscope platforms, with an AUROC reaching 0.95.
- **Advantages of deep neural networks**: Compared with SVM, deep neural networks have stronger generalization ability on multimodal data, especially when training on two datasets simultaneously, with a significant performance improvement.
### Conclusion
CellCycleNet provides an efficient and low - cost method that can accurately predict the cell - cycle stage through simple DAPI - stained images, reducing the intervention and cost of experimenters. In addition, the authors also made the image data, training models, and software publicly available, providing valuable resources for the community.
### Future prospects
Although CellCycleNet currently only supports two cell - cycle labels, G1 and S/G2, with the improvement of Fucci technology, more refined cell - cycle classification can be achieved in the future. In addition, the advantages of 3D image information in improving the performance of machine - learning models also provide an important reference for future bio - image analysis research.