Abstract:Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the problem of finding statistically significant spatial co-location patterns given Boolean spatial feature types, their instances, proximity relationships (such as closeness), and the hierarchical classification of feature types. Specifically, the paper focuses on how to detect subsets of feature types or their parent nodes that exhibit significant spatial interactions when considering the hierarchical relationships between feature types. ### Background and Challenges 1. **Limitations of Existing Methods**: - Existing co-location pattern detection methods mainly focus on identifying co-located features based on spatial proximity and co-occurrence frequency, but these methods usually only consider a single spatial scale. - They ignore the hierarchical relationships between features, which may lead to missing higher-level co-location patterns. - They lack a rigorous statistical foundation, which may increase the false discovery rate (FDR). 2. **Importance of the Problem**: - In ecology, it can be used to discover new symbiotic relationships in the food chain. - In spatial pathology, it can be used for cancer immunotherapy research. - In retail, it can be used to analyze the co-location relationships of different categories of stores. ### Solution 1. **Baseline Method**: - Iteratively check the significance of co-location patterns between leaf nodes or their ancestor nodes in the classification tree. - Use metrics such as participation index to evaluate the strength of co-location and set thresholds to determine whether the pattern is significant. 2. **Improved Method**: - Introduce the Benjamini-Hochberg procedure to control the false discovery rate (FDR), thereby reducing the risk of false discoveries while maintaining the ability to detect true co-location patterns. - This method applies the Benjamini-Hochberg procedure at each level, ensuring that at least some child nodes have significant co-location relationships with the target node before considering the co-location pattern between the parent node and the target node as significant. ### Experiments and Results - **Experimental Design**: Conduct experiments using synthetic data and real-world datasets to compare the performance of the baseline method and the improved method. - **Evaluation Metrics**: Mainly evaluate Type-I error rate (false positive rate) and Type-II error rate (false negative rate). - **Experimental Results**: The improved method (FDR-SSTCM) significantly reduces the Type-II error rate while maintaining a low Type-I error rate, showing better performance. ### Conclusion The paper proposes a statistically significant hierarchical classification-aware co-location pattern mining method (SSTCM) and further optimizes it by introducing the Benjamini-Hochberg procedure, effectively reducing the risk of false discoveries and improving the ability to detect true co-location patterns. Experimental results show that this method has high effectiveness and reliability in various application scenarios.

Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Spatial Co-Location Pattern Discovery Without Thresholds

Discovery of Regional Co-location Patterns with k-Nearest Neighbor Graph.

Significant spatial co-distribution pattern discovery

Mining Spatial Co-Location Patterns With Dynamic Neighborhood Constraint

Mining Regional Co-Location Patterns with Knng

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Mining Spread Patterns of Spatio-temporal Co-occurrences over Zones

CODEM: A Novel Spatial Co-location and De-location Patterns Mining Algorithm

Density Based Co-Location Pattern Discovery

A Novel Algorithm for Efficiently Mining Spatial Multi-level Co-location Patterns

On Discovering Co-Location Patterns in Datasets: A Case Study of Pollutants and Child Cancers

Spatial Co-Location Pattern Mining Based on the Improved Density Peak Clustering and the Fuzzy Neighbor Relationship.

Knowledge-based discovery of multi-level co-location patterns using ontology

A Statistical Information-Based Clustering Approach in Distance Space

Detecting Statistically Significant Geographical Anomalous Regions from Spatial Sampling Points by Coupling Gaussian Function and Multidirectional Optimization.

A Bottom-up Approach to Testing Hypotheses That Have a Branching Tree Dependence Structure, with False Discovery Rate Control

A fast spatial high utility co-location pattern mining approach based on branch-and-depth-extension

A hypothesis test for detecting spatial patterns in categorical areal data

A nonparametric spatial test to identify factors that shape a microbiome

A robust statistical approach for finding informative spatially associated pathways