Bump hunting by topological data analysis

Max Sommerfeld,Giseon Heo,Peter Kim,Stephen T. Rush,J. S. Marron
DOI: https://doi.org/10.1002/sta4.167
2017-01-01
Stat
Abstract:A topological data analysis approach is taken to the challenging problem of finding and validating the statistical significance of local modes in a data set. As with the SIgnificance of the ZERo (SiZer) approach to this problem, statistical inference is performed in a multi‐scale way, that is, across bandwidths. The key contribution is a two‐parameter approach to the persistent homology representation. For each kernel bandwidth, a sub‐level set filtration of the resulting kernel density estimate is computed. Inference based on the resulting persistence diagram indicates statistical significance of modes. It is seen through a simulated example, and by analysis of the famous Hidalgo stamps data, that the new method has more statistical power for finding bumps than SiZer. Copyright © 2017 John Wiley & Sons, Ltd.
statistics & probability
What problem does this paper attempt to address?