Vouw: Geometric Pattern Mining using the MDL Principle

Micky Faas,Matthijs van Leeuwen
DOI: https://doi.org/10.48550/arXiv.1911.09587
2019-11-23
Abstract:We introduce geometric pattern mining, the problem of finding recurring local structure in discrete, geometric matrices. It differs from existing pattern mining problems by identifying complex spatial relations between elements, resulting in arbitrarily shaped patterns. After we formalise this new type of pattern mining, we propose an approach to selecting a set of patterns using the Minimum Description Length principle. We demonstrate the potential of our approach by introducing Vouw, a heuristic algorithm for mining exact geometric patterns. We show that Vouw delivers high-quality results with a synthetic benchmark.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **finding repeatedly occurring local structures in discrete geometric matrices**, that is, **the geometric pattern mining problem**. Specifically: 1. **Limitations of existing pattern - mining problems**: Traditional pattern mining mainly focuses on patterns of types such as item sets, sub - graphs and sequences, while relatively little research has been done on raster - based data (such as satellite images, texture recognition, etc.), especially for mining complex spatial relationships of non - rectangular shapes. 2. **Introduction of new pattern types**: The paper proposes **geometric patterns of arbitrary shapes**. These patterns are geometrically connected, that is, one element can be reached from another by traversing only the elements within the pattern. In addition, these patterns can contain multiple possible values (including the Boolean case). 3. **Model selection problem**: Since many geometric patterns can be found in a typical matrix, it is necessary to find a set of patterns that are compact and can well describe the data structure. This is formalized as a model - selection problem, where the model is defined by a set of patterns. The paper uses the **Minimum Description Length (MDL) principle** to solve this problem, looking for the laws that describe the data by compressing the data. 4. **Algorithm implementation**: To achieve this goal, the author proposes a heuristic algorithm named **Vouw**, which can provide high - quality results in synthetic benchmark tests and can accurately recover the implanted patterns. In summary, this paper aims to fill the gaps in existing pattern - mining methods when dealing with raster - based data by introducing the geometric pattern - mining problem and its solutions, and to provide an effective pattern - selection method through the MDL principle.