Probabilistic-Mismatch Anomaly Detection: Do One’s Medications Match with the Diagnoses

Lingxiao Zhang,Xiang Li,Heifeng Liu,Jing Mei,Gang Hu,Junfeng Zhao,Yanzhen Zou,Bing Xie,Guotong Xie
DOI: https://doi.org/10.1109/icdm.2016.0077
2016-01-01
Abstract:Anomaly detection in healthcare data like patient records is no trivial task. The anomalies in these datasets are often caused by mismatches between different types of feature, e. g., medications that do not match with the diagnoses. Existing anomaly detection methods do not perform well when detecting " mismatches" between multiple types of feature, especially when the feature space is highdimensional and sparse. This paper introduces a novel anomaly detection paradigm: ProbabilisticMismatch Anomaly Detection (PMAD), which detects mismatches between features by modeling a normal instance with a common latent probability distribution that governs the generation of all types of feature. Under this paradigm, the target of anomaly detection is to find instances with dissimilar latent distributions. We further propose Topical PMAD based on an extended Latent Dirichlet Allocation (LDA) model, which is able to capture the latent relationship between features in a highdimensional space. Experiments on both synthetic data and realworld patient records show that Topical PMAD can effectively detect anomalies with mismatched features, and is highly robust against highdimensional data as well as inaccurate model selection. The realworld anomalies detected on a patient record dataset show a promising application prospect.
What problem does this paper attempt to address?