EE Seminar: On Principal Component Regression in High Dimension
(The talk will be given in English)
Speaker: Dr. Elad Romanov
Department of Statistics, Stanford
011 hall, Electrical Engineering-Kitot Building |
Sunday, January 19th, 2025
15:00 - 14:00
On Principal Component Regression in High Dimension
Principal component regression (PCR) is a classical two-step approach to linear regression, where one first reduces the data dimension by projecting onto its leading principal components, and then performs ordinary least squares regression. We study PCR in an asymptotic high-dimensional regression setting, where the number of data points is proportional to the dimension. Our main deliverables are asymptotically exact limiting formulas for the estimation and prediction risks, which depend in a nuanced way on the eigenvalues of the population covariance, the alignment between the population principal components and the true signal, and the number of selected components.
A key challenge in the high-dimensional regime is that the sample covariance matrix is an inconsistent estimate of its population counterpart, and thus sample principal components may fail to capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance matrix.
The analysis of (random design) linear regression in high dimension typically builds on powerful results from random matrix theory, such as the Marchenko–Pastur law and deterministic equivalents for the resolvent of a sample covariance matrix. However, these standard tools alone are not sufficient for analyzing the prediction risk of PCR. To that end, we leverage and develop somewhat less standard techniques, which, to our knowledge, have not seen wide use in the statistics literature to date: multi-resolvent traces and their associated eigenvector overlap measures.
Based on joint work with Alden Green (Stanford).
Short Bio
Elad Romanov is a postdoctoral researcher in the Department of Statistics, Stanford, where he is hosted by Prof. David Donoho. Prior to that, he completed his PhD in the School of Computer Science, the Hebrew University of Jerusalem, where he was advised by Profs. Or Ordentlich and Matan Gavish. His research interests broadly span high-dimensional statistics, information theory and signal processing, and the mathematics of data science.
השתתפות בסמינר תיתן קרדיט שמיעה לתלמידי תואר שני ושלישי = עפ"י רישום שם מלא + מספר ת.ז. בטופס הנוכחות שיועבר באולם במהלך הסמינר