Stochastic Tverberg theorems and their applications in multi-class logistic regression, data separability, and centerpoints of data

Jesús A. De Loera,Thomas A. Hogan
DOI: https://doi.org/10.48550/arXiv.1907.09698
2019-07-23
Abstract:We present new stochastic geometry theorems that give bounds on the probability that $m$ random data classes all contain a point in common in their convex hulls. We apply these stochastic separation theorems to obtain bounds on the probability of existence of maximum likelihood estimators in multinomial logistic regression. We also discuss connections to condition numbers for analysis of steepest descent algorithms in logistic regression and to the computation of centerpoints of data clouds.
Probability,Optimization and Control,Statistics Theory
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Existence of the Maximum Likelihood Estimator (MLE) in Multiclass Logistic Regression**: - The paper explores the probabilistic conditions for the existence of the Maximum Likelihood Estimator (MLE) in multiclass logistic regression. Specifically, by introducing stochastic geometric theorems (such as the Random Tverberg Theorem), the author gives a lower bound on the probability of the existence of MLE. These theorems help to understand the possibility of the existence of MLE given the number of data points and the dimension. 2. **Probabilistic Analysis of Data Separability**: - The author studies the probability of the intersection of the convex hulls of multiple data classes, that is, whether the convex hulls of multiple data classes have a common intersection point. This problem is closely related to data separability, especially in multiclass classification tasks. Understanding whether data is separable is of great significance for model selection and design. 3. **Calculation of Centerpoints and Their Applications**: - The paper also discusses how to use stochastic geometric methods to calculate the centerpoints of data clouds. Centerpoints are an important concept in high - dimensional data, similar to the median in one - dimensional data. Calculating centerpoints is very useful in many applications, but it is difficult to calculate. The paper proposes some approximation algorithms and analyzes the performance of these algorithms. ### Specific Problem Analysis #### 1. Existence of MLE in Multiclass Logistic Regression - **Background**: In binary classification problems, it is known that MLE exists if and only if the convex hulls of the two - class data intersect. But in multiclass problems, this condition becomes more complex. - **Contribution**: By introducing the Random Tverberg Theorem, the paper gives sufficient conditions for the existence of MLE in multiclass logistic regression. For example, Theorem 1.1 states that if the number of data points \( f(m) \) satisfies \( f(m)\gg(1 + \epsilon)m\log_2(m)\ln(\ln(m)) \), then the probability that MLE exists between each pair of labels' data is close to 1. #### 2. Probabilistic Analysis of Data Separability - **Background**: Data separability refers to whether there exists a hyperplane that can completely separate data of different classes. In multiclass problems, this problem is more complex. - **Contribution**: By the Random Tverberg Theorem, the paper gives a lower bound on the probability of data separability. For example, Theorem 2.6 states that if the data distribution \( D \) is balanced about a point \( p \), then the probability that the data set \( E_{m,n,D} \) is a Tverberg partition is at least \( \left(1 - 2^{-\lfloor n/2d\rfloor}\sum_{i = 1}^{t}\binom{\lfloor n/2d\rfloor}{i}\right)^m \). #### 3. Calculation of Centerpoints and Their Applications - **Background**: Centerpoints are an important concept in high - dimensional data for describing the central position of data. Calculating centerpoints is very useful in practical applications, but it is difficult to calculate. - **Contribution**: The paper proposes several approximation algorithms to calculate centerpoints and analyzes the performance of these algorithms. For example, Table 1.2 summarizes the time complexity and performance of different algorithms, and the random assignment algorithm performs well under a balanced distribution. ### Conclusion By introducing stochastic geometric theorems, especially the Random Tverberg Theorem, the paper solves the problems of the existence of MLE in multiclass logistic regression, the probabilistic analysis of data separability, and the calculation of centerpoints. These results are not only of great theoretical significance but also provide valuable tools and methods in practical applications.