Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction

Fatemeh Nassajian Mojarrad,Lorenzo Bini,Thomas Matthes,Stéphane Marchand-Maillet
2024-07-28
Abstract:In the complex landscape of hematologic samples such as peripheral blood or bone marrow derived from flow cytometry (FC) data, cell-level prediction presents profound challenges. This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of tabular cellular data. By representing the data as graphs and encoding hierarchical relationships between classes, we propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain. Extensive experiments on our cohort of 19 distinct patients, demonstrate that incorporating hierarchical biological constraints boosts performance significantly across multiple metrics compared to baseline GNNs without such priors. The proposed approach highlights the importance of structured inductive biases for gaining improved generalization in complex biological prediction tasks.
Machine Learning,Artificial Intelligence,Quantitative Methods
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges of multi - class single - cell prediction in Flow Cytometry (FC) data. Specifically, the author attempts to improve the classification accuracy of different cell types in complex blood samples (such as peripheral blood or bone marrow samples) by injecting hierarchical biological prior knowledge into Graph Neural Networks (GNNs). #### Main problems: 1. **Complex cell - type classification**: The data generated by flow cytometry is very complex. Each cell is characterized by multiple markers, forming high - dimensional and structured data. Traditional machine - learning methods have difficulty capturing the complex relationships and dependencies in these data. 2. **Lack of utilization of hierarchical information**: Existing GNN models usually do not fully utilize the hierarchical relationships between cell types (for example, some cell types are sub - classes of other cell types) when processing this type of data. This hierarchical relationship is very important for understanding the functions and biological characteristics of cells. 3. **Improving prediction performance**: The author hopes that by introducing hierarchical biological prior knowledge, the GNN model can better capture the hierarchical relationships between cell types, thereby improving the performance of the classification task. #### Solutions: - **Injection of hierarchical prior knowledge**: The author proposes a new method that encodes the hierarchical relationships between known cell types and functional categories into a tree - like structure and applies it as a constraint to the output space of the GNN model. This method ensures that the model can not only accurately classify cells into specific leaf nodes (specific cell types), but also respect higher - level classifications (broader cell lineages or functional categories). - **Custom - made hierarchical loss function**: To further strengthen the hierarchical constraints, the author designs a custom - made hierarchical loss function that takes into account both the traditional cross - entropy loss and the hierarchical similarity loss during the training process. This makes the model more in line with the biological hierarchical structure when making predictions. - **Experimental verification**: By conducting experiments on bone marrow samples from 19 patients, the author shows that after introducing hierarchical biological prior knowledge, the GNN model is significantly superior to the baseline model without using such prior knowledge in multiple evaluation metrics. In conclusion, this paper provides an effective method to solve the problem of multi - class single - cell prediction in flow cytometry data by combining hierarchical biological prior knowledge and graph neural networks, thereby improving the performance and generalization ability of the classification task.