Algorithms that Defy the Gravity of Learning Curve

Kai M Ting
2017-01-01
Abstract:Conventional wisdom posits that the learning behavior of all data mining algorithms follows a typical learning curve, where more data is expected to produce better performing models. We call this behavior the gravity of learning curve which all algorithms are assumed to comply. This project provides theoretical analysis and empirical evidence for the first time that nearest neighbor anomaly detectors defy the gravity of learning curve, ie, these gravity defiant algorithms can learn a better performing model using a small training set than that using a large training set. The knowledge we uncovered enables algorithms to be utilized in a new way to meet the challenges of big data without ever-increasing demands for big data infrastructures. This project has spent a signicant amount of time perfecting the theory and conducting a rigorous empirical evaluation. As a result, the insight gained is much better than we anticipated. The outcome is a major publication in Machine Learning Journal, published in early 2017. In addition, during this project period, four papers from two previous AOARD supported projects have been published. These include a major work on mass-based dissimilarity which was published in The ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2016. This work has informed one of the investigations in this project.Descriptors: algorithms, learning, data mining, training, geometry, gaussian distributions, methodology, detectors, pattern recognition, ALGORITHM THEORY
What problem does this paper attempt to address?