Abstract:Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal component analysis is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning principal component analysis is not well understood by many applied analytical scientists and spectroscopists who use principal component analysis. The meaning of features identified through principal component analysis is often unclear. This manuscript traces the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra. Principal component analysis relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of principal component analysis , such as the scores representing “concentration” or “weights". The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a principal component analysis model shows how to interpret application specific chemical meaning of the principal component analysis loadings and how to analyze scores. A critical benefit of principal component analysis is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.

When and why are principal component scores a good tool for visualizing high-dimensional data?

Dynamic Principal Component Analysis in High Dimensions

Dynamic Principal Subspaces in High Dimensions

Inference on the proportion of variance explained in principal component analysis

Principal component analysis: a review and recent developments

Principal Components Analysis in Clinical Studies

Visualizing genetic constraints

Dynamic Principal Subspaces with Sparsity in High Dimensions

Maximally Correlated Principal Component Analysis

A Covariance-Free Iterative Principal Component Analysis for High Dimensional and Large Scale Data

The High-Dimensional Asymptotics of Principal Component Regression

Asymptotic theory of principal component analysis for time series data with cautionary comments

High-Dimensional PCA Revisited: Insights from General Spiked Models and Data Normalization Effects

Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains

Principal Component Analyses in Anthropological Genetics

Sparse Functional Principal Component Analysis in High Dimensions

Asymptotic properties of Principal Component Analysis and shrinkage-bias adjustment under the Generalized Spiked Population model

Multilevel Functional Principal Component Analysis for High-Dimensional Data

Intrinsic dimension estimation of data by principal component analysis

Sparse and Functional Principal Components Analysis

Exploration of Principal Component Analysis: Deriving Principal Component Analysis Visually Using Spectra