A principal component analysis is a way to reduce dimensionality of a data set consisting of numeric vectors to a lower dimensionality. Then it is possible to visualize the data set in three or less dimensions. This is analogous to lowering down the Rank of the Matrix which means that we decompose the Matrix into lower order one such that there is no more linear dependency of one feature , on the other features or combination of features. The algorithm
So P’ x e_{1} is the new x-coordinate for every point, P’ x e_{2} the new y-coordinate and so on. In the illustration the eigenvectors are are normed to length 1 and shown as red, blue and green. Those eigen vectors are in this context called “principal components”. e_{1} is the first principal component, e_{2} the second and e_{3} the third. The ordering by their associated eigenvalues is important because the larger the eigenvalue, the more important is the component. In this case it might seem odd to call all three components “principal” – but when we are using PCA for a case with originally 1000 dimensions involved and accordingly many components, then the first N components with comparatively large eigenvalues are of principal importance. Lets Look one simple code in R for principal component analysis library(caret) library(plyr)
#### for prinicipal Component Analysis correlation<- abs(cor(subset (trips ,select =-c(source,pilot,Risk_involved)))) diag(correlation) <- 0 which (correlation >0.3 , arr.ind=T) start <- as.matrix(trips[trips$evt_cnt >0,]) start <- scale(start, center=TRUE,scale=TRUE) pca_events <- prcomp(start) summary(pca_events) write.csv(pca_events$rotation,"pca_events.csv")
Dimensionality Reduction, Lower Rank Matrix, PCA