Novelty and Similarity: Detection Using Data‐Driven Soft Independent Modeling of Class Analogy

O. Y. Rodionova,N. I. Kurysheva,G. A. Sharova,A. L. Pomerantsev
DOI: https://doi.org/10.1002/cem.3587
IF: 2.5
2024-07-20
Journal of Chemometrics
Abstract:Novelty and similarity are complex concepts that have numerous applications in various fields, including biology and medicine. Novelty detection is a technique used to determine whether a dataset is different from another dataset considered as a standard. Similarity detection is a technique used to determine whether two datasets belong to the same population. Novelty and similarity are closely related concepts; however, they are not complementary. Novelty is a much more popular one, and there are many publications about it. Similarity is, in fact, a new concept that has not yet been explored in depth. Classical statistics offers a large number of tools suitable for detection of similarity, mostly in the univariate case. At the same time, this topic has been insufficiently studied in the field of machine learning. This paper suggests several principles which are important for this research and also present a method for the detection of both novelty and similarity. The method uses a one‐class classifier, known as Data‐Driven Soft Independent Modeling of Class Analogy (DD‐SIMCA). Three examples illustrate our approach. The first one uses simulated data and demonstrates the performance of DD‐SIMCA for the detection of novelty. The second example uses a real‐world data and studies similarity of two groups of patients who participate in the evaluation of the effectiveness of the treatment of primary angle‐closure glaucoma. The third example comes from medical diagnostics. This is a real‐world publicly available data used for comparison of various classification algorithms.
chemistry, analytical,instruments & instrumentation,mathematics, interdisciplinary applications,automation & control systems,computer science, artificial intelligence,statistics & probability
What problem does this paper attempt to address?