Abstract:Model selection aims to determine an appropriate model scale given a small size of samples, which is an important topic in machine learning. As one type of efficient solution, an automatic model selection starts from a large enough model scale, and has an intrinsic mechanism to push redundant structures to be ineffective and thus discarded automatically during learning. Priors are usually imposed on parameters to facilitate an automatic model selection. There still lack systematic comparisons on automatic model selection approaches with priors, and this thesis is motivated for such a study based on models with local Gaussian structures.Particularly, we compare the relative strength and weakness of three typical automatic model selection approaches, namely Variational Bayesian (VB), Minimum Message Length (MML) and Bayesian Ying-Yang (BYY) harmony learning, on models with local Gaussian structures. First, we consider Gaussian Mixture Model (GMM), for which the number of Gaussian components is to be determined. Further assuming each Gaussian component has a subspace structure, we extend to consider two models namely Mixture of Factor Analyzers (MFA) and Local Factor Analysis (LFA), for both of which the component number and local subspace dimensionalities are to be determined.Two types of priors are imposed on parameters, namely a conjugate form prior and a Jeffreys prior. The conjugate form prior is chosen as a Dirichlet-Normal-Wishart (DNW) prior for GMM, and as a Dirichlet-Normal-Gamma (DNG) prior for both MFA and LFA. The Jeffreys prior and the MML approach are not considered on MFA/LFA due to the difficulty in deriving the corresponding Fisher information matrix. Via extensive simulations and applications, comparisons on the automatic model selection algorithms (six for GMM and four for MFA/LFA), we get following main findings:1. Considering priors on all parameters makes each approach perform better than considering priors merely on the mixing weights.2. For all the three approaches on GMM, the performance with the DNW prior is better than with the Jeffreys prior. Moreover, Jeffreys prior makes MML slightly better than VB, while the DNW prior makes VB better than MML.3. As the DNW prior hyper-parameters on GMM are changed from fixed to freely optimized by each of its own learning principle, BYY improves its performance, while VB and MML deteriorate their performances. This observation remains the same when we compare BYY and VB on either MFA or LFA with the DNG prior. Actually, VB and MML lack a good guide for optimizing prior hyper-parameters. 4. For both GMM and MFA/LFA, BYY considerably outperforms both VB and MML, for any type of priors and whether hyper-parameters are optimized. Being different from VB and MML that rely on appropriate priors, BYY does not highly depend on the type of priors. It performs already well without priors and improves by imposing a Jeffreys or a conjugate form prior.5. Despite the equivalence in maximum likelihood parameter learning, MFA and LFA affect the performances by VB and BYY in automatic model selection. Particularly, both BYY and VB perform better on LFA than on MFA, and the superiority of LFA is reliable and robust.In addition to adopting the existing algorithms either directly or with some modifications, this thesis develops five new algorithms to fill the missing gap. Particularly on GMM, the VB algorithm with Jeffreys prior and the BYY algorithm with DNW prior are developed, in the latter of which a multivariate Student’s T-distribution is obtained as the posterior via marginalization. On MFA and LFA, BYY algorithms with DNG priors are developed, where products of multiple Student’s T-distributions are obtained in posteriors via approximated marginalization. Moreover, a VB algorithm on LFA is developed as an alternative choice to the existing VB algorithm on MFA.

Optimal neighbourhood selection in structural equation models

Understanding Best Subset Selection: A Tale of Two C(omplex)ities

Neighborhood selection with application to social networks

Optimal subdata selection for linear model selection

A tutorial on Bayesian structural equation modelling: Principles and applications

Causality on cross-sectional data: Stable specification search in constrained structural equation modeling

Comparison between Maximum Likelihood and Bayesian Estimation in Structural Equation Modelling and Effects of Informative Priors

Optimal Model Selection in RDD and Related Settings Using Placebo Zones

G-optimal designs for hierarchical linear models: an equivalence theorem and a nature-inspired meta-heuristic algorithm

Bayesian Versus Frequentist Estimation for Structural Equation Models in Small Sample Contexts: A Systematic Review

Asymptotics of AIC, BIC, and RMSEA for Model Selection in Structural Equation Modeling

Feature Selection for Efficient Local-to-Global Bayesian Network Structure Learning

Inconsistency of cross-validation for structure learning in Gaussian graphical models

Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling

Variable selection in linear regression models: choosing the best subset is not always the best choice

Prediction-Oriented Model Selection In Partial Least Squares Path Modeling

Optimal subgroup selection

Sequential Bayesian optimal experimental design for structural reliability analysis

Automatic Model Selection on Local Gaussian Structures with Priors: Comparative Investigations and Applications

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach