Minimum average partial correlation and parallel analysis: The influence of oblique structures

P.-O. Caron
DOI: https://doi.org/10.1080/03610918.2018.1433843
2018-02-12
Abstract:Parallel analysis (Horn 1965) and the minimum average partial correlation (MAP; Velicer 1976) have been widely spread as optimal solutions to identify the correct number of axes in principal component analysis. Previous results showed, however, that they become inefficient when variables belonging to different components strongly correlate. Simulations are used to assess their power to detect the dimensionality of data sets with oblique structures. Overall, MAP had the best performances as it was more powerful and accurate than PA when the component structure was modestly oblique. However, both stopping rules performed poorly in the presence of highly oblique factors.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the ability of the minimum average partial correlation (MAP) and parallel analysis (PA) methods to identify the correct number of components in principal component analysis (PCA) when the data set contains an oblique structure (i.e., there is correlation between factors). Although these two methods are widely regarded as the best solutions for determining the correct number of components in PCA, their performance in dealing with data sets with an oblique structure has not been fully studied. Through Monte Carlo simulation experiments, the paper systematically evaluates the performance of MAP and PA under different sample sizes, different component saturations, and different degrees of obliqueness, in order to verify the reliability and accuracy of these methods in practical applications. Specifically, the paper focuses on the following aspects: 1. **The influence of the oblique structure**: Research how the performance of MAP and PA changes when the factors in the data set are not completely orthogonal but oblique. 2. **The influence of the sample size**: Explore the influence of different sample sizes on the ability of these two methods to identify the correct number of components. 3. **The influence of the component saturation**: Analyze the influence of the saturation of components (i.e., the number of variables and loading values in each component) on the performance of MAP and PA. 4. **The comparison of methods**: Compare the performance of MAP and PA under different conditions, especially their performance differences in data sets with an oblique structure. Through these studies, the paper aims to provide more accurate guidance for researchers to select appropriate stopping rules when dealing with data sets with an oblique structure, thereby avoiding missing meaningful components or wrongly retaining meaningless components.