New directions in algebraic statistics: Three challenges from 2023

Yulia Alexandr,Miles Bakenhus,Mark Curiel,Sameer K. Deshpande,Elizabeth Gross,Yuqi Gu,Max Hill,Joseph Johnson,Bryson Kagy,Vishesh Karwa,Jiayi Li,Hanbaek Lyu,Sonja Petrović,Jose Israel Rodriguez
2024-02-22
Abstract:In the last quarter of a century, algebraic statistics has established itself as an expanding field which uses multilinear algebra, commutative algebra, computational algebra, geometry, and combinatorics to tackle problems in mathematical statistics. These developments have found applications in a growing number of areas, including biology, neuroscience, economics, and social sciences. Naturally, new connections continue to be made with other areas of mathematics and statistics. This paper outlines three such connections: to statistical models used in educational testing, to a classification problem for a family of nonparametric regression models, and to phase transition phenomena under uniform sampling of contingency tables. We illustrate the motivating problems, each of which is for algebraic statistics a new direction, and demonstrate an enhancement of related methodologies.
Statistics Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper explores three new research directions in algebraic statistics, specifically as follows: 1. **Cognitive Diagnostic Models in Educational and Psychological Measurement**: - **Problem Background**: This section studies a discrete statistical model called "Bless" (Binary Latent Clique Star Forest), which is used for cognitive diagnosis of latent skills in educational and psychological measurement. - **Research Content**: The study investigates the likelihood geometry of these models through Maximum Likelihood Estimation (MLE). The aim is to better understand these models and lay the foundation for further research into deep generative models. 2. **Equivalence Identification of Nonparametric Regression Trees**: - **Problem Background**: This section focuses on identifying regression trees that are nearly equivalent in terms of data fitting to improve the Bayesian Additive Regression Trees (BART) model. - **Research Content**: By using methods from algebraic statistics and combinatorial mathematics, the study aims to enhance the performance of the BART model in nonparametric regression. Specifically, it seeks to accelerate the Markov Chain Monte Carlo (MCMC) sampling process by identifying equivalent or nearly equivalent decision trees. 3. **Phase Transition Phenomena in Contingency Tables**: - **Problem Background**: The study investigates the phase transition phenomena between uniform distribution and hypergeometric distribution in ternary contingency tables. When marginal conditions meet certain criteria, the hypergeometric distribution can approximate the uniform distribution well; otherwise, the approximation is poor. - **Research Content**: Using the language and methods of algebraic statistics, this phenomenon is extended to multidimensional contingency tables. Specifically, the study examines the phase transition phenomena in three-dimensional contingency tables and attempts to address this issue. These three directions demonstrate the intersection of algebraic statistics with other application fields, by reinterpreting these problems and showing how new connections can effectively solve a series of challenges.