High-dimensional statistics
Kueh, Nickl, Samworth, Shah
Recent technological advances have dramatically increased the volume of data that scientists collect. One prototypical problem arising from high-dimensional data is variable selection, as faced, for example, by practitioners conducting a microarray experiment, who want to flag important proteins for further investigation, and where such further study may be costly.
We have
- improved and extended the approach of Stability Selection (Meinshausen and Bühlmann, 2010 JRSS-B read paper), significantly increasing its accuracy and giving it wide applicability.
- widely generalised the applicability of Sure Independence Screening (Fan and Lv, 2008), a technique popular for its computational speed
- proposed locally adaptive wavelet estimators for data on compact homogeneous manifolds, such as d-dimensional unit spheres, useful in statistical analysis of data sets in astrophysics, such as ultra-high energy cosmic rays.
- determined the optimal choice of k for the ubiquitous k-nearest neighbour classifier.
![]() |
| Stability selection involves selecting variables `stable' under data resamplings, giving practitioners better guidance over where to place the all-important dividing line between signal (red) and noise (blue) |
|
Ultra high dimensional feature selection: beyond the linear model Journal of Machine Learning Research, 10, 2013-2038. Choice of neighbor order in nearest-neighbor classification Annals of Statistics, 36, 2135-2152. Concentration Inequalities and Confidence Bands for Needlet Density Estimators on Compact Homogeneous Manifolds. Probability Theory and Related Fields, 2011to appear. |
- © 2011 the Statistical Laboratory,
University of Cambridge
Information provided by webmaster@statslab.cam.ac.uk - Privacy

