Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Subhankar Ghosh,Arun Sharma,Jayant Gupta,Shashi Shekhar
2024-07-04
Abstract:Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach.
Information Retrieval,Applications
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the problem of finding statistically significant spatial co-location patterns given Boolean spatial feature types, their instances, proximity relationships (such as closeness), and the hierarchical classification of feature types. Specifically, the paper focuses on how to detect subsets of feature types or their parent nodes that exhibit significant spatial interactions when considering the hierarchical relationships between feature types. ### Background and Challenges 1. **Limitations of Existing Methods**: - Existing co-location pattern detection methods mainly focus on identifying co-located features based on spatial proximity and co-occurrence frequency, but these methods usually only consider a single spatial scale. - They ignore the hierarchical relationships between features, which may lead to missing higher-level co-location patterns. - They lack a rigorous statistical foundation, which may increase the false discovery rate (FDR). 2. **Importance of the Problem**: - In ecology, it can be used to discover new symbiotic relationships in the food chain. - In spatial pathology, it can be used for cancer immunotherapy research. - In retail, it can be used to analyze the co-location relationships of different categories of stores. ### Solution 1. **Baseline Method**: - Iteratively check the significance of co-location patterns between leaf nodes or their ancestor nodes in the classification tree. - Use metrics such as participation index to evaluate the strength of co-location and set thresholds to determine whether the pattern is significant. 2. **Improved Method**: - Introduce the Benjamini-Hochberg procedure to control the false discovery rate (FDR), thereby reducing the risk of false discoveries while maintaining the ability to detect true co-location patterns. - This method applies the Benjamini-Hochberg procedure at each level, ensuring that at least some child nodes have significant co-location relationships with the target node before considering the co-location pattern between the parent node and the target node as significant. ### Experiments and Results - **Experimental Design**: Conduct experiments using synthetic data and real-world datasets to compare the performance of the baseline method and the improved method. - **Evaluation Metrics**: Mainly evaluate Type-I error rate (false positive rate) and Type-II error rate (false negative rate). - **Experimental Results**: The improved method (FDR-SSTCM) significantly reduces the Type-II error rate while maintaining a low Type-I error rate, showing better performance. ### Conclusion The paper proposes a statistically significant hierarchical classification-aware co-location pattern mining method (SSTCM) and further optimizes it by introducing the Benjamini-Hochberg procedure, effectively reducing the risk of false discoveries and improving the ability to detect true co-location patterns. Experimental results show that this method has high effectiveness and reliability in various application scenarios.