Biology Direct

official impact factor 3.74

Open Access Highly Access Research

Component retention in principal component analysis with application to cDNA microarray data

Richard Cangelosi1 and Alain Goriely3,2*

Author Affiliations

1 Department of Mathematics, University of Arizona, Tucson AZ85721, USA

2 Program in Applied Mathematics, University of Arizona, Tucson AZ85721, USA

3 BIO5 Institute, University of Arizona, Tucson AZ85721, USA

For all author emails, please log on.

Biology Direct 2007, 2:2 doi:10.1186/1745-6150-2-2

Published: 17 January 2007

Abstract

Shannon entropy is used to provide an estimate of the number of interpretable components in a principal component analysis. In addition, several ad hoc stopping rules for dimension determination are reviewed and a modification of the broken stick model is presented. The modification incorporates a test for the presence of an "effective degeneracy" among the subspaces spanned by the eigenvectors of the correlation matrix of the data set then allocates the total variance among subspaces. A summary of the performance of the methods applied to both published microarray data sets and to simulated data is given.

This article was reviewed by Orly Alter, John Spouge (nominated by Eugene Koonin), David Horn and Roy Varshavsky (both nominated by O. Alter).