Tai-Yu Ma,Joseph Y. J. Chow,Jia Xu
Abstract:This work contributes to develop a new methodology to identify empirical-data-driven causal structure of a domain knowledge. We propose an algorithm as a two-stage procedure by first drawing relevant prior partial relationships between variables and using them as structure constraints in a structure learning task of Bayesian networks (BNs). The latter is then based on a model averaging approach to obtain a statistically sound BN. The empirical study focuses on modeling commuters' travel mode choice. We present experimental results on testing the design of prior restrictions, the effect of resampling size and learning algorithms, and the effect of random draw on fitted BN structure. The results show that the proposed method can capture more sophisticated relationships between the variables that are missing in both decision tree models and random utility models.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use structural constraints and model averaging algorithms to identify causal structures in traffic mode choice. Specifically, the author proposes a new methodology to identify the causal structure of domain knowledge based on empirical data through Bayesian Networks (BNs). This method first extracts the partial relationships between relevant variables from domain knowledge and uses them as structural constraints in the structure learning task of Bayesian Networks. Then, a statistically reliable Bayesian Network is obtained based on the model averaging method. This study focuses particularly on commuter traffic mode choice modeling.
### Main Contributions
1. **Propose the use of Bayesian Networks as an alternative to decision - tree methods**:
- Compared with traditional decision - tree methods, the Bayesian - network - based method can more flexibly estimate the causal structure between variables and reveal causal effects that cannot be discovered by decision - tree models.
- Demonstrates the value of this method in revealing causal relationships between variable attributes through empirical data.
2. **Propose a new hybrid structure - learning algorithm**:
- Combines prior knowledge in the expert domain and model - averaging techniques to jointly estimate the structure and parameters of Bayesian Networks.
- Through a two - stage process, first use expert knowledge to select relevant variables and determine causal or prohibited relationships as structural constraints, and then use a data - driven structure - learning algorithm to search for good structures in the restricted topological space.
### Research Background
- **Traffic Mode Choice Behavior**: Over the past few decades, factors for understanding traffic mode choice behavior and its complex causal relationships have been proposed in multiple aspects such as psychology, geography, and transportation science.
- **Limitations of Existing Models**: Traditional Logit/Probit models are based on random utility theory, but have limitations when modeling individual discrete - choice decisions. In addition, the complexity of human decision - making makes the heterogeneity in traffic mode choice potentially unobservable.
- **Application of Bayesian Networks**: Bayesian Networks have been successfully applied in fields such as medical diagnosis, decision - support systems, semantic search, and bioinformatics, and can identify causal relationships between related events.
### Methodology
- **Bayesian Network Structure Learning**: Through a two - stage process, first use expert knowledge to determine the partial relationships between variables as structural constraints, and then use the model - averaging method to obtain a statistically reliable Bayesian Network.
- **Parameter Learning**: Given the structure, estimate the conditional probability distribution by maximizing the likelihood function.
- **Goodness - of - Fit Evaluation**: Use indicators such as the Bayesian Information Criterion (BIC) to evaluate the goodness - of - fit of the model.
### Empirical Research
- **Data Source**: Based on the travel - mode - choice data of cross - border workers in Luxembourg.
- **Prior Knowledge**: Extract relationships between variables from literature and expert knowledge for use as constraints in structure learning.
- **Experimental Results**: Tested the sensitivity of different model parameters, and the results show that the proposed method can capture more complex variable relationships and is superior to decision - tree models and random - utility models.
### Conclusions
- The method proposed in this study can effectively identify the causal relationships between variables in traffic - mode - choice modeling, providing a more flexible and powerful tool than traditional methods.
- Future research can further expand and optimize this method for application in other fields and larger data sets.