Research
My main research interests are in nonparametric and high-dimensional statistics. Particular topics include shape-constrained density and other nonparametric function estimation problems, nonparametric classification, clustering and regression, Independent Component Analysis, the bootstrap and high-dimensional variable selection and dimension reduction problems. Some general articles and a video about my research can be found here, here and here.
Publications and Preprints
- Feng, O., Guntuboyina, A., Kim, A. K. H. and Samworth, R. J. (2018) Adaptation in multivariate log-concave density estimation. Preprint. (.pdf 803K).
- Gataric, M., Wang, T. and Samworth, R. J. (2018) Sparse principal component analysis via random projections. Preprint. (.pdf 1.1M). The accompanying R package SPCAvRP is available from CRAN.
- Berrett, T. B., Wang, Y., Barber, R. F. and Samworth, R. J. (2018) The conditional permutation test. Preprint. (.pdf 404K).
- Cannings, T. I., Fan, Y. and Samworth, R. J. (2018) Classification with imperfect training labels. Preprint. (.pdf 576K).
- Yu, Y., Bradic, J. and Samworth, R. J. (2018) Confidence intervals for high-dimensional Cox models. Preprint. (.pdf 932K).
- Barber, R. F., Candès, E. J. and Samworth, R. J. (2018) Robust inference via knockoffs. Preprint. (.pdf 386K).
- Xu, M. and Samworth, R. J. (2017) High-dimensional nonparametric density estimation via symmetry and shape constraints. Working paper. (.pdf 2.3M)
- Cannings, T. I., Berrett, T. B. and Samworth, R. J. (2017) Local nearest neighbour classification with applications to semi-supervised learning. Preprint. (.pdf 412K).
- Berrett, T. B. and Samworth, R. J. (2019) Nonparametric independence testing via mutual information. Biometrika, to appear. (.pdf 668K). The accompanying R package IndepTest is available from CRAN.
- Han, Q., Wang, T., Chatterjee, S. and Samworth, R. J. (2019) Isotonic regression in general dimensions. Ann. Statist., to appear. (.pdf 466K).
- Berrett, T. B., Samworth, R. J. and Yuan, M. (2019) Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. Statist., 47, 288-318. (.pdf 285K). The online supplementary material is available here: (.pdf, 462K).
- Mitchell, P. D., Brown, R., Wang, T., Shah, R. D., Samworth, R. J., Deakin, S., Edge, P., Hudson, I., Hutchinson, R., Stohr, K., Latimer, M., Natarajan, R., Qasim, S., Rehm, A., Sanghrajka, A., Tissingh, E. and Wright, G. (2019) Multi-centre study of non-accidental injury and limb fractures in young children in the East Anglia region, UK. Archives of Disease in Childhood, 0, 1-6 (.pdf, 519K).
- Kim, A. K. H., Guntuboyina, A. and Samworth, R. J. (2018) Adaptation in log-concave density estimation. Ann. Statist., 46, 2279-2306. (.pdf 256K). The online supplementary material is available here: (.pdf, 327K).
- Wang, T. and Samworth, R. J. (2018) High dimensional change point estimation via sparse projection. J. Roy. Statist. Soc., Ser. B, 80, 57-83. (.pdf 1.5M). The accompanying R package InspectChangepoint is available from CRAN.
- Samworth, R. J. (2018) Recent progress in log-concave density estimation. Statist. Sci., 33, 493-509. (.pdf 832K).
- Sen, B. and Samworth, R. J. (2018) Editorial: Special issue on ``Nonparametric inference under shape constraints''. Statist. Sci., 33, 469-472. (.pdf 72K).
- Banerjee, M. and Samworth, R. J. (2018) A conversation with Jon Wellner. Statist. Sci., 33, 633-651. (.pdf 5.5M)
- Bødker, J. S., Brøndum, R. F., Schmitz, A., Schönherz, A. A., Jespersen, D. S., Sønderkær, M., Vesteghem, C., Due, H., Nøgaard C. H., Perez-Andres, M., Samur, M. K., Davies, F., Walker, B., Pawlyn, C., Kaiser, M., Johnson, D., Bertsch, U., Broyl, A., van Duin, M., Shah, R., Johansen, P., Nøgaard, M. A., Samworth, R. J., Sonneveld, P., Goldschmidt. H., Morgan, G. J., Orfao, A., Munshi, N., El-Galaly, T., Dybkær, K. and Bøgsted, M. (2018) A multiple myeloma classification system that associates normal B-cell subset phenotypes with prognosis. Blood Advances, 2, 2400-2411. (.pdf, 2.4M).
- Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification. J. Roy. Statist. Soc., Ser. B (with discussion), 79, 959-1035. (.pdf 1.8M). The accompanying R package RPEnsemble is available from CRAN.
- Lockhart, R. A. and Samworth, R. J. (2017) Comments on `High-dimensional simultaneous inference with the bootstrap' by R. Dezeure, P. Bühlmann and C.-H. Zhang. TEST, 26, 734-739. (.pdf 228K).
- Kim, A. K. H. and Samworth, R. J. (2016) Global rates of convergence in log-concave density estimation. Ann. Statist., 44, 2756-2779. (.pdf 214K). The online supplementary material is available here: (.pdf, 1.1M).
- Wang, T., Berthet, Q. and Samworth, R. J. (2016) Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist., 44, 1896-1930. (.pdf, 500K). The online supplementary material is available here: (.pdf, 368K).
- Chen, Y. and Samworth, R. J. (2016) Generalised additive and index models with shape constraints. J. Roy. Statist. Soc., Ser. B, 78, 729-754. (.pdf, 404K). The accompanying R package scar, short for shape constrained additive regression, is available from CRAN.
- Samworth, R. J. (2016) Peter Hall's work on high-dimensional data and classification. Ann. Statist., 44, 1888-1895. (.pdf 312K)
- Yu, Y., Wang, T. and Samworth, R. J. (2015) A useful variant of the DavisKahan theorem for statisticians. Biometrika, 102, 315-323. (.pdf, 188K)
- Dybkær, K., Bøgsted, M., Falgreen, S., Bødker, J. S., Kjeldsen, M. K., Schmitz, A., Bilgrau, A. E., Xu-Monette, Z. Y., Li, L., Bergkvist, K. S., Laursen, M. B., Rodrigo-Domingo, M., Marques, S. C., Rasmussen, S. B., Nyegaard, M., Gaihede, M., Møller, M. B., Samworth, R. J., Shah, R. D., Johansen, P., El-Galaly, T. C., Young, K. H. and Johnsen, H. E. (2015) A diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis. J. Clinical Oncology, 33, 1379-1388.
- Shah, R. D. and Samworth, R. J. (2015) Invited discussion of An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian. J. Amer. Statist. Assoc., 110, 1439-1442 (.pdf, 604K).
- Samworth, R. J. (2014) Big Data: a new era for Statistics. The Eagle, 43-46. (.pdf, 836K)
- Chen, Y., Shah, R. D. and Samworth, R. J. (2014) Invited discussion of Multiscale change point inference by K. Frick, A. Munk and H. Sieling. J. Roy. Statist. Soc., Ser. B, 76, 544-546. (.pdf 68K).
- Shah, R. D. and Samworth, R. J. (2013) Variable selection with error control: Another look at Stability Selection. J. Roy. Statist. Soc., Ser. B, 75, 55-80. DOI: 10.1111/j.1467-9868.2011.01034.x (.pdf, 1.1M). Some associated R code can be found here.
- Chen, Y. and Samworth, R. J. (2013) Smoothed log-concave maximum likelihood estimation with applications. Statist. Sinica, 23, 1373-1398. (.pdf, 500K)
- Shah, R. D. and Samworth, R. J. (2013) Invited discussion of Correlated variables in regression: clustering and sparse estimation by P. Bühlmann, P. Rütimann, S. van de Geer and C.-H. Zhang. J. Statist. Plann. Inf., 143, 1866-1868. (.pdf 378K)
- Dümbgen, L., Samworth, R. J. and Schuhmacher, D. (2013) Stochastic search for semiparametric linear regression models. In From Probability to Statistics and Back: High-Dimensional Models and Processes -- A Festschrift in Honor of Jon A. Wellner. Eds M. Banerjee, F. Bunea, J. Huang, V. Koltchinskii, M. H. Maathuis, pp. 78-90. (.pdf 224K).
- Yu, Y. and Samworth, R. J. (2013) Invited discussion of Large Covariance Estimation by
Thresholding Principal Orthogonal Complements by J. Fan, Y. Liao and M. Mincheva. J. Roy. Statist. Soc., Ser. B., 75, 656-658. (.pdf 364K)
- Samworth, R. J. and Yuan, M. (2012) Independent component analysis via nonparametric maximum likelihood estimation. Ann. Statist., 40, 2973-3002. (.pdf, 556K)
- Samworth, R. J. (2012) Optimal weighted nearest neighbour classifiers. Ann. Statist., 40, 2733-2763. DOI: 10.1214/12-AOS1049 (.pdf, 336K). Online supplement (.pdf, 308K). The optimal weighting scheme is implemented in the R packages 'FNN' (written by Shengqiao Li) and 'kknn' (written by Klaus Schliep), both available on CRAN .
- Samworth, R. J. (2012) Stein's Paradox. Eureka, 62, 38-41. (.pdf 608K)
- Samworth. R. J. (2011) Invited discussion of Adaptive confidence intervals for the test error in classification by Laber and Murphy. J. Amer. Statist. Assoc., 106, 914-915 (.pdf, 88K).
- Dümbgen, L., Samworth, R. and Schuhmacher, D. (2011) Approximation by log-concave distributions with applications to regression. Ann. Statist., 39, 702-730 (.pdf, 232K). A longer version of the paper is available here: (.pdf, 1.0MB)
- Cule, M., Samworth, R. and Stewart, M. (2010) Maximum likelihood estimation of a multi-dimensional log-concave density. J. Roy. Statist. Soc., Ser. B. (with discussion), 72, 545-600. (.pdf, 3M). A longer version of the paper is also available here: (.pdf, 1.5M)
- Cule, M., Samworth, R. and Stewart, M. (2010) Rejoinder to Maximum likelihood estimation of a multi-dimensional log-concave density. J. Roy. Statist. Soc., Ser. B., 72, 600-607. (.pdf, 116K)
- Cule, M. and Samworth, R. (2010) Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electron. J. Stat., 4, 254-270. (.pdf, 200K)
- Shah, R. D. and Samworth, R. J. (2010) Invited discussion of Stability selection by Meinshausen and Bühlmann. J. Roy. Statist. Soc., Ser. B, 72, 455-456. (.pdf, 48K)
- Samworth, R. J. and Wand, M. P. (2010) Asymptotics and optimal bandwidth selection for highest density region estimation. Ann. Statist., 38, 1767-1792. (.pdf, 1.4M)
- Gramacy, R., Samworth, R. and King, R. (2010) Importance tempering. Statistics and Computing, 20, 1-7. (.pdf, 306K)
- Fan, J., Samworth, R. and Wu, Y. (2009) Ultrahigh dimensional feature selection: beyond the linear model. J. Machine Learning Research, 10, 2013-2038. (.pdf, 256K).
- Fan, J., Feng, Y., Samworth, R. and Wu, Y. (2009) SIS, An R package for (Iterative) Sure Independence Screening for generalized linear models and Cox's proportional hazards models, available from CRAN .
- Cule, M., Gramacy, R. B. and Samworth, R. (2009) LogConcDEAD: an R package for maximum likelihood estimation of a multivariate log-concave density. J. Statist. Software, 29, Issue 2.
- Cule, M. and Samworth, R. (2009) Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. In Challenges in Statistical Theory: Complex Data Structures and Algorithmic Optimization. Mathematisches Forschungsinstitut Oberwolfach, Report No. 39/2009, 438-440. Eds.: Beran RJ, Klüppelberg C and Polonik W. (.pdf, 64K)
- Hall, P., Park, B. U. and Samworth, R. J. (2008) Choice of neighbor order in nearest-neighbor classification. Ann. Statist., 36, 2135-2152. (.pdf, 196K). A longer version of the paper is also available here: (.pdf, 236K)
- Samworth, R. (2008) Invited discussion of Sure independence screening for ultra-high dimensional feature space by Fan and Lv. J. Roy. Statist. Soc., Ser. B , 70, 888-889. (.pdf, 84K).
- Cule, M., Gramacy, R., Samworth, R. and Chen, Y. (2007) LogConcDEAD, An R package for log-concave density estimation in arbitrary dimensions, version 1.4.2 available from CRAN .
- Samworth, R. and Gowland, R. (2007) Estimation of adult skeletal age-at-death: statistical assumptions and applications. International Journal of Osteoarchaeology, 17, 174-188. (.pdf, 200K)
- Poore, H. R., Samworth, R., White, N. J., Jones, S. M. and McCave, I. N. (2006) Neogene overflow of northern component water at the Greenland-Scotland ridge. Geochem. Geophys. Geosyst., 7, Q06010, doi:10.1029/2005GC001085. (.pdf, 2.5M)
- Samworth, R. and Poore, H. (2005) Understanding past ocean circulations: a nonparametric regression case study. Statistical Modelling , 5, 289-307. (.pdf, 1.7M)
- Johnson, O. and Samworth, R. (2005) Central Limit Theorem and convergence to stable laws in Mallows distance. Bernoulli, 11, 829-845. (.pdf, 176K)
- Samworth, R. (2005) Small confidence sets for the mean of a spherically symmetric distribution. J. Roy. Statist. Soc., Ser. B, 67, 343-361. (.pdf, 512K)
- Hall, P. and Samworth, R. J. (2005) Properties of bagged nearest-neighbour classifiers. J. Roy. Statist. Soc., Ser. B, 67, 363-379. (.pdf, 300K).
- Samworth, R. J. (2004) Some mathematical and theoretical aspects of the bootstrap. Ph.D. thesis, University of Cambridge. (.pdf, 1.5M)
- Samworth, R. (2003) A note on methods of restoring consistency to the bootstrap. Biometrika, 90, 985-990. (.pdf, 164K)
- Samworth, R. J. (2014) New challenges in high-dimensional statistical inference. Poster. (.pdf, 684K)
- Samworth, R. and Johnson, O. (2005) The empirical process in Mallows distance, with application to goodness-of-fit tests. Preprint. (.pdf, 296K)
- Samworth, R. J. (2004) Some asymptotic results for the bootstrap distribution of the sample mean. Preprint . (.pdf, 228K)
- Samworth, R. J. (2003) Bootstrap diagnostics and inconsistency. Preprint . (.pdf, 280K)
- Samworth, R. (2000) Shrinkage Estimators , Part III Essay, University of Cambridge. (.pdf, 288K)
Selected recent talks
- Log-concavity: New theory and methodology (Berlin, January 2013) (.pdf, 392K)
- Log-concave density estimation with applications (Lund, September 2012) (.pdf, 244K)
- High-dimensional variable selection in Statistics (Cambridge, September 2012) (.pdf, 268K)
- Independent component analysis via nonparametric maximum likelihood estimation (Istanbul, July 2012) (.pdf, 256K)
- Variable selection with error control: Another look at Stability Selection (Tsukuba, July 2012) (.pdf, 256K)
- Optimal weighted nearest neighbour classifiers (Essex, May 2012) (.pdf, 256K)