Abstract:Automated cell segmentation has become increasingly crucial for disease diagnosis and drug discovery, as manual delineation is excessively laborious and subjective. To address this issue with limited manual annotation, researchers have developed semi/unsupervised segmentation approaches. Among these approaches, the Deep Gaussian mixture model plays a vital role due to its capacity to facilitate complex data distributions. However, these models assume that the data follows symmetric normal distributions, which is inapplicable for data that is asymmetrically distributed. These models also obstacles weak generalization capacity and are sensitive to outliers. To address these issues, this paper presents a novel asymmetric mixture model for unsupervised cell segmentation. This asymmetric mixture model is built by aggregating certain multivariate Gaussian mixture models with log-likelihood and self-supervised-based optimization functions. The proposed asymmetric mixture model outperforms (nearly 2-30% gain in dice coefficient, p<0.05) the existing state-of-the-art unsupervised models on cell segmentation including the segment anything.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the problem of asymmetric data distribution in cell segmentation. Specifically, the traditional Deep Gaussian Mixture Model (DGMM) assumes that data follows a symmetric normal distribution, which performs poorly when dealing with asymmetrically distributed data in actual pathological images, and these models are sensitive to external factors, such as outliers and weak generalization ability.
To solve these problems, the author proposes a new Deep Asymmetric Mixture Model (DAMM) for the unsupervised cell segmentation task. This model estimates the asymmetric distribution by introducing a hierarchical Gaussian mixture model and combines a self - supervised learning optimization function to improve the model performance.
### Main contributions
1. **Completely unsupervised method**: Solves the challenge of asymmetric data distribution.
2. **Rotation - invariant optimization module**: Improves the model performance through gradient descent.
3. **Comprehensive experiments**: Conducted comparative experiments with the latest unsupervised segmentation methods, including the Segment Anything Model (SAM).
### Formula summary
The key formulas involved in the paper are as follows:
- **Probability density function of the asymmetric mixture model**:
\[
f(x_i|\Theta) = \sum_{j = 1}^{K}\pi_jp(x_i|K)
\]
where \(\pi_j\) is the mixing weight of the asymmetric mixture model, and \(p(x_i|K)\) is the probability density function of the \(K\) - th component.
- **Probability density function of the multivariate Gaussian distribution**:
\[
\phi(x_i|\mu_{j,m},\Sigma_{j,m})=\frac{1}{(2\pi)^{\frac{D}{2}}|\Sigma_{j,m}|^{\frac{1}{2}}}\exp\left(-\frac{1}{2}(x_i - \mu_{j,m})^T\Sigma_{j,m}^{-1}(x_i - \mu_{j,m})\right)
\]
- **Log - likelihood function**:
\[
L(\Theta)=\sum_{i = 1}^{N}\log\left[\sum_{j = 1}^{K}\pi_j\sum_{m = 1}^{M}\alpha_{j,m}\phi(x_i|\mu_{j,m},\Sigma_{j,m})\right]
\]
- **Parameter update formulas**:
\[
\mu_{j,m}^{(t + 1)}=\frac{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}y_{i,j,m}^{(t + 1)}I_i'}{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}y_{i,j,m}^{(t + 1)}}
\]
\[
\Sigma_{j,m}^{(t + 1)}=\frac{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}y_{i,j,m}^{(t + 1)}(I_i'-\mu_{j,m}^{(t + 1)})(I_i'-\mu_{j,m}^{(t + 1)})^T}{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}y_{i,j,m}^{(t + 1)}}
\]
\[
\pi_j^{(t + 1)}=\frac{1}{N}\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}
\]
\[
\alpha_{j,m}^{(t + 1)}=\frac{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}y_{i,j,m}^{(t + 1)}}{\sum_{i = 1}^{N}z_{i,j}^{(t + 1)}\sum_{m = 1}^{M}}