SPSS Syntax for Combining Results of Principal Component Analysis of Multiply Imputed Data Sets using Generalized Procrustes Analysis

Bart van Wingerde,Joost van Ginkel
DOI: https://doi.org/10.1177/0146621621990757
IF: 1.522
2021-02-04
Applied Psychological Measurement
Abstract:Multiple imputation (Rubin, 1987) is a well-known method for handling missing data. Applying the procedure to an incomplete data set results in several plausible complete versions of the incomplete data set which are then all analyzed with the same statistical analysis. In order to obtain one overall analysis that is used for interpretation, the analysis results of these several completed data sets are combined using specific combination procedures. For principal component analysis (PCA), Van Ginkel and Kroonenberg (2014) proposed generalized procrustes analysis (GPA; Gower, 1975; Ten Berge, 1977) to combine the results. To date, GPA seems to have been little used for combining PCA results in multiply imputed data sets, as shown from relatively few citations of Van Ginkel and Kroonenberg (2014) by applied research papers. One reason could be that there are only few software packages that have implemented GPA. Exceptions are the "shapes" package (Dryden & Mardia, 2016) in R (R Core Team, 2018) and the stand-alone program 3WayPack (Kroonenberg & De Roo, 2010). In addition, these software packages may not be well known by applied researchers, and it may not be obvious to them that they may also be used for combining the results of PCA in multiply imputed data. For these researchers, the authors developed a user-friendly SPSS subroutine which is specifically aimed at combining the results of PCA, as described by Van Ginkel and Kroonenberg (2014), and which can be applied completely within SPSS. To run the subroutine, one must first carry out a PCA on each of the imputed data sets in SPSS and save the results to a data file. Next, the subroutine may carry out the combining of the saved results using GPA. Within the subroutine, a number of required arguments and some optional arguments are specified. Among the most important optional arguments are the display of the Varimax rotated centroid solution in the output, and the display of loading plots of both the unrotated and Varimax rotated centroid solutions, along with their convex hulls (Van Ginkel & Kroonenberg, 2014).
psychology, mathematical,social sciences, mathematical methods
What problem does this paper attempt to address?