Parallel Analysis (sometimes called “Horn’s Parallel Analysis” named for its creator John L. Horn) is a method for selecting principal components that accounts for variance in the data due to random error or noise. The process of performing Parallel Analysis can be summarized as follows:
1.Perform PCA on the dataset and determine the eigenvalues for each of the PCs
2.Simulate a dataset with the same number of variables (p) and observations (n) as the original data
3.Perform PCA on the simulated dataset and determine the simulated eigenvalues
4.Repeat the simulation/PCA process a large number of times (default 1000), calculating eigenvalues for each simulation
5.Calculate the average and 95th percentile of the eigenvalues for each PC across all simulations
6.Compare the actual eigenvalues to the 95th percentile of the eigenvalues from the simulations
7.Retain (select) components with eigenvalues greater than the 95th percentile of the eigenvalues from the simulations
The idea here is that simply due to random error (sampling variability) in the data, PCA will generate some components with eigenvalues greater than 1. In general, the first eigenvalues generated by “noise” data will increase as the number of variables increases, and will decrease as the number of observations increases. By retaining only those PCs with eigenvalues greater than the 95th percentile of the simulated eigenvalues, you’re ensuring that the variance explained by these PCs likely represents “real” variance and not variance due to noise.