Abstract:In Bayesian network structure learning, the quality of the directed graph learned by the constraint-based approaches can be greatly affected by the order of choosing variable pairs and the order of selecting condition sets for testing conditional independence. Inspired by the strong connection between the degree of mutual information shared by two variables and their conditional independence, we introduce the M-ordering concept, where a matrix is precomputed from the observational data with variables ordered increasingly by their respective degree of mutual information with the target variable under concern. Given the M-ordering matrix, we propose a strategy called Weakest Mutual-Information-First Strategy (WMIF), which is integrated into the PC-algorithm in two aspects: an MI-based edge removal strategy, and an MI-based condition set generation strategy. The MI-based edge removal strategy is to always select the variable pair with the weakest mutual information to test their conditional independence; the condition set generation strategy is to construct a conditioning set where variables bearing a weaker degree of mutual information with the target variable are always considered first. We prove that the weakest MI-based edge removal strategy is sound, and our PC-MI algorithm, a PC variant empowered by the WMIF strategy, is order-independent. Moreover, in PC algorithms, the number of conditional independence tests increases exponentially with the number of random variables; we show that the WMIF strategy can effectively reduce the complexity (bounded by o ( | V | ( 2 | a d j ( X ) | − | V | 2 2 ) )). We have conducted experiments with both low-dimensional and high-dimensional data sets, and the results indicate that PC-MI outperforms the state-of-the-art approaches. More importantly, the order-agnostic property of PC-MI can be extremely useful when it is hard to prescribe a meaningful variable ordering as needed in some other PC algorithms.

Selective AnDE Based on Attributes Ranking by Maximin Conditional Mutual Information (MMCMI)

Feature Selection with Conditional Mutual Information Considering Feature Interaction

Novel algorithm for attribute reduction based on mutual-information gain ratio

Automatic Coefficient Selection in Weighted Maximum Margin Criterion

Selective AnDE for Large Data Learning: a Low-Bias Memory Constrained Approach

Sample-Based Attribute Selective A$n$ DE for Large Data

Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information

Feature Selection for Monotonic Classification

Attribute Importance Measurement Method Based on Data Coordination Degree

Feature Selection with Attributes Clustering by Maximal Information Coefficient

On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and Classification

Feature Selection for Monotonic Classification Via Maximizing Monotonic Dependency

Class dependent feature scaling method via restrictive Bayesian network classifier combination

A Double Layer Bayesian Classifier Using Conditional Mutual Information

Alleviating the Attribute Conditional Independence and I.I.D. Assumptions of Averaged One-Dependence Estimator by Double Weighting

Class-specific feature selection via maximal dynamic correlation change and minimal redundancy

Learning Bayesian Network Structures Using Weakest Mutual-Information-first Strategy

Feature Selection Algorithm Based On Conditional Dynamic Mutual Information

Improving Associative Classification by Incorporating Novel Interestingness Measures

Exploring 2-rank strategic weight manipulation in multiple attribute decision making and its applications in project review and university ranking

MiniAnDE: a reduced AnDE ensemble to deal with microarray data