Estimating the Number of Clusters Via Proportional Chinese Restaurant Process

Yingying Wen,Hangjin Jiang,Jianwei Yin
DOI: https://doi.org/10.1145/3426826.3426840
2020-01-01
Abstract:Dirichlet Process Mixture (DPM) models tend to produce some major clusters along with many small clusters. These small confusing clusters are highly overlapped with major clusters. As the size of samples increasing without the change of sample distribution, the small unnecessary clusters would be introduced more and more in the cluster results. Recently, powered Chinese Restaurant Process (pCRP) is purposed to eliminate the counterfactual small clusters. However, it violates the usual and indispensable exchangeability assumption of DPM. In this paper, we propose a new method called proportional Chinese Restaurant Process (pro-CRP) that keeps the property of exchangeability while reduces the number of unnecessary small clusters. We show the experiment results on comparing pro-CRP with CRP and pCRP models and prove the number of clusters reduced by pro-CRP.
What problem does this paper attempt to address?