Abstract:Parallel analysis (Horn 1965) and the minimum average partial correlation (MAP; Velicer 1976) have been widely spread as optimal solutions to identify the correct number of axes in principal component analysis. Previous results showed, however, that they become inefficient when variables belonging to different components strongly correlate. Simulations are used to assess their power to detect the dimensionality of data sets with oblique structures. Overall, MAP had the best performances as it was more powerful and accurate than PA when the component structure was modestly oblique. However, both stopping rules performed poorly in the presence of highly oblique factors.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the ability of the minimum average partial correlation (MAP) and parallel analysis (PA) methods to identify the correct number of components in principal component analysis (PCA) when the data set contains an oblique structure (i.e., there is correlation between factors). Although these two methods are widely regarded as the best solutions for determining the correct number of components in PCA, their performance in dealing with data sets with an oblique structure has not been fully studied. Through Monte Carlo simulation experiments, the paper systematically evaluates the performance of MAP and PA under different sample sizes, different component saturations, and different degrees of obliqueness, in order to verify the reliability and accuracy of these methods in practical applications. Specifically, the paper focuses on the following aspects: 1. **The influence of the oblique structure**: Research how the performance of MAP and PA changes when the factors in the data set are not completely orthogonal but oblique. 2. **The influence of the sample size**: Explore the influence of different sample sizes on the ability of these two methods to identify the correct number of components. 3. **The influence of the component saturation**: Analyze the influence of the saturation of components (i.e., the number of variables and loading values in each component) on the performance of MAP and PA. 4. **The comparison of methods**: Compare the performance of MAP and PA under different conditions, especially their performance differences in data sets with an oblique structure. Through these studies, the paper aims to provide more accurate guidance for researchers to select appropriate stopping rules when dealing with data sets with an oblique structure, thereby avoiding missing meaningful components or wrongly retaining meaningless components.

Minimum average partial correlation and parallel analysis: The influence of oblique structures

Robust Principal Component Analysis Based on Maximum Correntropy Criterion

Considering Horn’s Parallel Analysis from a Random Matrix Theory Point of View

Deterministic parallel analysis: An improved method for selecting factors and principal components

A New Algorithm for Computing Disjoint Orthogonal Components in the Parallel Factor Analysis Model with Simulations and Applications to Real-World Data

Improving the Use of Parallel Analysis by Accounting for Sampling Variability of the Observed Correlation Matrix

Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data

Indexed-Points Parallel Coordinates Visualization of Multivariate Correlations

Impacts of aspect ratio on task accuracy in parallel coordinates

Structured Principal Component Analysis Model With Variable Correlation Constraint

Permutation methods for factor analysis and PCA

Hierarchical disjoint principal component analysis

Maximally Correlated Principal Component Analysis

Application of copulas to improve covariance estimation for partial least squares

On Minimum Trace Factor Analysis -- An Old Song Sung to a New Tune

Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments

Robust oblique Target-rotation for small samples

High Dimensional Factor Analysis with Weak Factors

Principal variables analysis for non-Gaussian data

Canonical Principal Angles Correlation Analysis for Two-View Data

When and why are principal component scores a good tool for visualizing high-dimensional data?