231053 Statistical inference in factor analysis for high-dimensional, low-sample size data

Monday, November 8, 2010 : 1:03 PM - 1:14 PM

Miguel Marino , Department of Biostatistics, Harvard School of Public Health, Boston, MA
Yi Li , Department of Biostatistics, Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, MA
Cancer researchers are keen on tracking trends in cancer mortality rates and studying the cross relationship of these trends not only for scientific reasons of understanding the cancers as a complex dynamical system, but also for practical reasons such as prevention, planning and resource allocation. Factor analysis which studies such cross-correlation matrices is an effective means of data reduction, whose inference typically requires the number of random variables, p, to be relatively small and fixed, and the sample size, n, to be approaching infinity. However, contemporary surveillance techniques have yielded large matrices in both dimensions, limiting the usage of existing factor analysis techniques due to the poor estimate of the covariance/correlation matrix. We develop methods, in the framework of random matrix theory, to study the cross-correlation of cancer mortality annual rate changes in the setting where p > n. We propose methodology to test complete independence across cancer sites. We develop an approach based on group sequential theory to determine the number of significant factors in a factor model. Sparse principal components analysis is studied on the principal components deemed to be significantly different than random matrix theory prediction to aid in the interpretation of the underlying factors. Methods are implemented on SEER cancer mortality rates from 1969 through 2005.

Learning Areas:
Biostatistics, economics

Learning Objectives:
Demonstrate methods, in the framework of random matrix theory, that have been developed to study the cross-correlation of cancer mortality annual rate changes in the setting where the number of variables is greater than the sample size.

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a fourth year PhD candidate in Biostatistics working on statistical issues dealing with cancer data and have worked carefully with my advisor Yi Li on this matter.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.