Research
My main research interests are in nonparametric and high-dimensional statistics.
Particular topics include shape-constrained estimation problems; data perturbation methods (e.g. subsampling, bootstrap sampling, random projections, knockoffs); nonparametric classification; unconditional and conditional independence testing; estimation of entropy and other functionals; changepoint detection and estimation; missing data; variable selection; and applications, including public health, genetics, archaeology and oceanography. Some general articles and a video about my research can be found here, here and here.
Publications and preprints
- Ma, T., Verchand, K., Berrett, T. B., Wang, T. and Samworth, R. J. (2024) Estimation beyond Missing (Completely) at Random. Preprint. (.pdf 1.4M).
- Ma, T., Verchand, K. and Samworth, R. J. (2024) High-probability minimax lower bounds. Preprint. (.pdf 678K).
- Feng, O. Y., Kao, Y-.C., Xu, M. and Samworth, R. J. (2024) Optimal convex M-estimation via score matching. Preprint. (.pdf 4.7M). The accompanying R package asm is available from CRAN.
- Lundborg, A. R., Kim, I., Shah, R. D. and Samworth, R. J. (2024+) The Projected Covariance Measure for assumption-lean variable significance testing. Ann. Statist., to appear. (.pdf 1.0M).
- Müller, M. M., Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2024+) Isotonic subgroup selection. J. Roy. Statist. Soc., Ser. B, to appear. (.pdf 3.5M). The accompanying R package ISS is available from CRAN.
- Wang, T., Dobriban, E., Gataric, M. and Samworth, R. J. (2024+) Sharp-SSL: Selective high-dimensional axis-aligned random projections for semi-supervised learning. J. Amer. Statist. Assoc., to appear. (.pdf 779K).
- Chen, Y., Wang, T. and Samworth, R. J. (2024) Inference in high-dimensional online changepoint detection. J. Amer. Statist. Assoc., 119, 1461-1472. (.pdf 1.4M). The online supplementary material is available here and the algorithm is available here.
- Chen, W., Mazumder, R. and Samworth, R. J. (2024) A new computational framework for log-concave density estimation. Math. Prog. Comp., https://doi.org/10.1007/s12532-024-00252-0. (.pdf 4.2M). The algorithm is available here.
- Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2023) Optimal subgroup selection. Ann. Statist., 51, 2342-2365. (.pdf 519K). The online supplementary material is available here: (.pdf 486K).
- Berrett, T. B. and Samworth, R. J. (2023) Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility. Ann. Statist., 51, 2170-2193. (.pdf 420K). The online supplementary material is available here: (.pdf 380K) and the accompanying R package MCARtest is available from CRAN.
- Berrett, T. B. and Samworth, R. J. (2023) Efficient functional estimation and the super-oracle phenomenon. Ann. Statist., 51, 668-690. (.pdf 304K). The online supplementary material is available here: (.pdf 544K).
- Ferreira, T., Collins, A. M., Feng, O., Samworth, R. J. and Horvath, R. (2023) Career intentions of medical students in the UK: a national, cross-sectional study (AIMS study). BMJ Open, 13:e075598. (.pdf 1.5M). Press coverage: The Guardian, The Sun, Sky News, The Daily Mail.
- Zhu, Z., Wang, T. and Samworth, R. J. (2022) High-dimensional principal component analysis with heterogeneous missingness. J. Roy. Statist. Soc., Ser. B, 84, 2000-2031. (.pdf 1.8M). The accompanying R package primePCA is available from CRAN, and supplementary material is available at the bottom of this page.
- Follain, B., Wang, T. and Samworth, R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. J. Roy. Statist. Soc., Ser. B, 84, 1023-1055. (.pdf 1.2M). The MissInspect algorithm is available here.
- Feng, O. Y., Venkataramanan, R., Rush, C. and Samworth, R. J. (2022) A unifying tutorial on Approximate Message Passing. Foundations and Trends in Machine Learning, 15, 335-536. (.pdf 1.9M).
- Feng, O. Y., Chen Y., Han, Q., Carroll, R. J. and Samworth, R. J. (2022) Nonparametric, tuning-free estimation of S-shaped functions. J. Roy. Statist. Soc., Ser. B, 84, 1324-1352. (.pdf 1.9M). The accompanying R package Sshaped is available from CRAN, and supplementary material is available at the bottom of this page.
- Pananjady, A. and Samworth, R. J. (2022) Isotonic regression with unknown permutations: Statistics, computation, and adaptation. Ann. Statist., 50, 324-350. (.pdf 397K). The online supplementary material is available here: (.pdf 551K).
- Chen, Y., Wang, T. and Samworth, R. J. (2022) High-dimensional, multiscale online changepoint detection. J. Roy. Statist. Soc., Ser. B, 84, 234-266. (.pdf 3.7M). The accompanying R package ocd is available from CRAN.
- Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2021) Adaptive transfer learning. Ann. Statist., 49, 3618-3649. (.pdf 444K). The online supplementary material is available here: (.pdf 316K).
- Berrett, T. B., Kontoyiannis, I. and Samworth, R. J. (2021) Optimal rates for independence testing via U-statistic permutation tests. Ann. Statist., 49, 2457-2490. (.pdf 458K). The accompanying R package USP is available from CRAN, and the online supplementary material is available here: (.pdf 352K).
- Samworth, R. J. and Yuan, M. (2021) Preface: Section of memorial articles for Willem van Zwet. Ann. Statist., 49, 2431. (.pdf 35K).
- Berrett, T. B. and Samworth, R. J. (2021) USP: an independence test that improves on Pearson's chi-squared and the G-test. Proc. Roy. Soc. A, 477, 20210549. (.pdf 1.2M). The accompanying R package USP is available from CRAN.
- Samworth, R. J. and Yuan, M. (2021) Editorial: Memorial issue for Charles Stein. Ann. Statist., 49, 1811-1814. (.pdf 69K).
- Liu, H., Gao, C. and Samworth, R. J. (2021) Minimax rates in sparse, high-dimensional changepoint detection. Ann. Statist., 49, 1081-1112. (.pdf 521K). The online supplementary material is available here: (.pdf 258K)
- Xu, M. and Samworth, R. J. (2021) High-dimensional nonparametric density estimation via symmetry and shape constraints. Ann. Statist., 49, 650-672. (.pdf 2.4M). The online supplementary material is available here: (.pdf 542K)
- Feng, O. Y., Guntuboyina, A., Kim, A. K. H. and Samworth, R. J. (2021) Adaptation in multivariate log-concave density estimation. Ann. Statist., 49, 129-153. (.pdf 357K). The online supplementary material is available here: (.pdf 1.2M)
- Barber, R. F. and Samworth, R. J. (2021) Local continuity of log-concave projection, with applications to estimation under model misspecification. Bernoulli, 27, 2437-2472. (.pdf 434K)
- Dümbgen, L., Samworth, R. J. and Wellner, J. A. (2021) Bounding distributional errors via density ratios. Bernoulli, 27, 818-852. (.pdf 386K)
- Yu, Y., Bradic, J. and Samworth, R. J. (2021) Confidence intervals for high-dimensional Cox models. Statist. Sinica, 31, 243-267. (.pdf 932K).
- Jones, N. K., Rivett, L., Seaman, S., Samworth, R. J., Warne, B., Workman, C., Ferris, M., Wright, J., Quinnell, N., Shaw, A., Cambridge COVID-19 Collaboration, Goodfellow, I. G., Lehner, P. J., Howes R., Wright, G., Matheson, N. J., Weekes, M. J. (2021) Single-dose BNT162b2 vaccine protects against asymptomatic SARS-CoV-2 infection. eLife, 10:e68808.
- Janková, J., Shah, R. D., Bühlmann, P. and Samworth, R. J. (2020) Goodness-of-fit testing in high-dimensional generalized linear models. J. Roy. Statist. Soc., Ser. B, 82, 773-795. (.pdf 976K). The accompanying R package GRPtests is available from CRAN.
- Gataric, M., Wang, T. and Samworth, R. J. (2020) Sparse principal component analysis via axis-aligned random projections. J. Roy. Statist. Soc., Ser B, 82, 329-359. (.pdf 3.5M). The accompanying R package SPCAvRP is available from CRAN.
- Berrett, T. B., Wang, Y., Barber, R. F. and Samworth, R. J. (2020) The conditional permutation test for independence while controlling for confounders. J. Roy. Statist. Soc., Ser B, 82, 175-197. (.pdf 1.0M).
- Cannings, T. I., Fan, Y. and Samworth, R. J. (2020) Classification with imperfect training labels. Biometrika, 107, 311-330. (.pdf 531K).
- Cannings, T. I., Berrett, T. B. and Samworth, R. J. (2020) Local nearest neighbour classification with applications to semi-supervised learning. Ann. Statist., 48, 1789-1814. (.pdf 313K). The online supplementary material is available here: (.pdf 409K).
- Barber, R. F., Candès, E. J. and Samworth, R. J. (2020) Robust inference with knockoffs. Ann. Statist., 48, 1409-1431. (.pdf 398K). The online supplementary material is available here: (.pdf 217K).
- Rivett, L., Sridhar S., Sparkes D., Routledge, M., Jones, N. K., Forrest, S. Young, J. Pereira-Dias, J., Hamilton, W. L., Ferris, M., Torok, M. E., Meredith, L., The CITIID-NIHR COVID-19 BioResource Collaboration, Curran, M., Fuller, S., Chaudhry, A., Shaw, A., Samworth, R. J., Bradley, J. R., Dougan, G., Smith, K. G. C., Lehner, P. J., Matheson, N. J., Wright, G., Goodfellow, I., Baker, S., Weekes, M. P. (2020) Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission. eLife, 9:e58728.
- Han, Q., Wang, T., Chatterjee, S. and Samworth, R. J. (2019) Isotonic regression in general dimensions. Ann. Statist., 47, 2440-2471. (.pdf 362K). The online supplementary material is available here: (.pdf, 358K).
- Berrett, T. B., Samworth, R. J. and Yuan, M. (2019) Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. Statist., 47, 288-318. (.pdf 306K). The online supplementary material is available here: (.pdf, 443K).
- Berrett, T. B. and Samworth, R. J. (2019) Nonparametric independence testing via mutual information. Biometrika, 106, 547-566. (.pdf 540K). The accompanying R package IndepTest is available from CRAN.
- Mitchell, P. D., Brown, R., Wang, T., Shah, R. D., Samworth, R. J., Deakin, S., Edge, P., Hudson, I., Hutchinson, R., Stohr, K., Latimer, M., Natarajan, R., Qasim, S., Rehm, A., Sanghrajka, A., Tissingh, E. and Wright, G. (2019) Multi-centre study of non-accidental injury and limb fractures in young children in the East Anglia region, UK. Archives of Disease in Childhood, 0, 1-6 (.pdf, 519K).
- Kim, A. K. H., Guntuboyina, A. and Samworth, R. J. (2018) Adaptation in log-concave density estimation. Ann. Statist., 46, 2279-2306. (.pdf 290K). The online supplementary material is available here: (.pdf, 305K).
- Wang, T. and Samworth, R. J. (2018) High dimensional change point estimation via sparse projection. J. Roy. Statist. Soc., Ser. B, 80, 57-83. (.pdf 1.5M). The accompanying R package InspectChangepoint is available from CRAN.
- Samworth, R. J. (2018) Recent progress in log-concave density estimation. Statist. Sci., 33, 493-509. (.pdf 832K).
- Samworth, R. J. and Sen, B. (2018) Editorial: Special issue on ``Nonparametric inference under shape constraints''. Statist. Sci., 33, 469-472. (.pdf 72K).
- Banerjee, M. and Samworth, R. J. (2018) A conversation with Jon Wellner. Statist. Sci., 33, 633-651. (.pdf 5.5M)
- Bødker, J. S., Brøndum, R. F., Schmitz, A., Schönherz, A. A., Jespersen, D. S., Sønderkær, M., Vesteghem, C., Due, H., Nøgaard C. H., Perez-Andres, M., Samur, M. K., Davies, F., Walker, B., Pawlyn, C., Kaiser, M., Johnson, D., Bertsch, U., Broyl, A., van Duin, M., Shah, R., Johansen, P., Nøgaard, M. A., Samworth, R. J., Sonneveld, P., Goldschmidt. H., Morgan, G. J., Orfao, A., Munshi, N., El-Galaly, T., Dybkær, K. and Bøgsted, M. (2018) A multiple myeloma classification system that associates normal B-cell subset phenotypes with prognosis. Blood Advances, 2, 2400-2411. (.pdf, 2.4M).
- Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification. J. Roy. Statist. Soc., Ser. B (with discussion), 79, 959-1035. (.pdf 1.8M). The accompanying R package RPEnsemble is available from CRAN.
- Lockhart, R. A. and Samworth, R. J. (2017) Comments on `High-dimensional simultaneous inference with the bootstrap' by R. Dezeure, P. Bühlmann and C.-H. Zhang. TEST, 26, 734-739. (.pdf 228K).
- Kim, A. K. H. and Samworth, R. J. (2016) Global rates of convergence in log-concave density estimation. Ann. Statist., 44, 2756-2779. (.pdf 244K). The online supplementary material is available here: (.pdf, 1.1M).
- Wang, T., Berthet, Q. and Samworth, R. J. (2016) Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist., 44, 1896-1930. (.pdf, 497K). The online supplementary material is available here: (.pdf, 364K).
- Chen, Y. and Samworth, R. J. (2016) Generalized additive and index models with shape constraints. J. Roy. Statist. Soc., Ser. B, 78, 729-754. (.pdf, 404K). The accompanying R package scar, short for shape constrained additive regression, is available from CRAN.
- Samworth, R. J. (2016) Peter Hall's work on high-dimensional data and classification. Ann. Statist., 44, 1888-1895. (.pdf 310K)
- Yu, Y., Wang, T. and Samworth, R. J. (2015) A useful variant of the DavisKahan theorem for statisticians. Biometrika, 102, 315-323. (.pdf, 188K)
- Dybkær, K., Bøgsted, M., Falgreen, S., Bødker, J. S., Kjeldsen, M. K., Schmitz, A., Bilgrau, A. E., Xu-Monette, Z. Y., Li, L., Bergkvist, K. S., Laursen, M. B., Rodrigo-Domingo, M., Marques, S. C., Rasmussen, S. B., Nyegaard, M., Gaihede, M., Møller, M. B., Samworth, R. J., Shah, R. D., Johansen, P., El-Galaly, T. C., Young, K. H. and Johnsen, H. E. (2015) A diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis. J. Clinical Oncology, 33, 1379-1388.
- Shah, R. D. and Samworth, R. J. (2015) Invited discussion of An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian. J. Amer. Statist. Assoc., 110, 1439-1442 (.pdf, 604K).
- Samworth, R. J. (2014) Big Data: a new era for Statistics. The Eagle, 43-46. (.pdf, 836K)
- Chen, Y., Shah, R. D. and Samworth, R. J. (2014) Invited discussion of Multiscale change point inference by K. Frick, A. Munk and H. Sieling. J. Roy. Statist. Soc., Ser. B, 76, 544-546. (.pdf 68K).
- Shah, R. D. and Samworth, R. J. (2013) Variable selection with error control: Another look at Stability Selection. J. Roy. Statist. Soc., Ser. B, 75, 55-80. (.pdf, 1.1M). Some associated R code can be found here.
- Chen, Y. and Samworth, R. J. (2013) Smoothed log-concave maximum likelihood estimation with applications. Statist. Sinica, 23, 1373-1398. (.pdf, 500K)
- Shah, R. D. and Samworth, R. J. (2013) Invited discussion of Correlated variables in regression: clustering and sparse estimation by P. Bühlmann, P. Rütimann, S. van de Geer and C.-H. Zhang. J. Statist. Plann. Inf., 143, 1866-1868. (.pdf 378K)
- Dümbgen, L., Samworth, R. J. and Schuhmacher, D. (2013) Stochastic search for semiparametric linear regression models. In From Probability to Statistics and Back: High-Dimensional Models and Processes -- A Festschrift in Honor of Jon A. Wellner. Eds M. Banerjee, F. Bunea, J. Huang, V. Koltchinskii, M. H. Maathuis, pp. 78-90. (.pdf 224K).
- Yu, Y. and Samworth, R. J. (2013) Invited discussion of Large Covariance Estimation by
Thresholding Principal Orthogonal Complements by J. Fan, Y. Liao and M. Mincheva. J. Roy. Statist. Soc., Ser. B., 75, 656-658. (.pdf 364K)
- Samworth, R. J. and Yuan, M. (2012) Independent component analysis via nonparametric maximum likelihood estimation. Ann. Statist., 40, 2973-3002. (.pdf, 504K)
- Samworth, R. J. (2012) Optimal weighted nearest neighbour classifiers. Ann. Statist., 40, 2733-2763. (.pdf, 334K). The online supplementary material is available here: (.pdf, 297K). The optimal weighting scheme is implemented in the R packages 'FNN' (written by Shengqiao Li) and 'kknn' (written by Klaus Schliep), both available on CRAN .
- Samworth, R. J. (2012) Stein's Paradox. Eureka, 62, 38-41. (.pdf 608K)
- Samworth. R. J. (2011) Invited discussion of Adaptive confidence intervals for the test error in classification by Laber and Murphy. J. Amer. Statist. Assoc., 106, 914-915 (.pdf, 88K).
- Dümbgen, L., Samworth, R. and Schuhmacher, D. (2011) Approximation by log-concave distributions with applications to regression. Ann. Statist., 39, 702-730 (.pdf, 615K). A longer version of the paper is available here: (.pdf, 1.0MB)
- Cule, M., Samworth, R. and Stewart, M. (2010) Maximum likelihood estimation of a multi-dimensional log-concave density. J. Roy. Statist. Soc., Ser. B. (with discussion), 72, 545-607. (.pdf, 3M). A longer version of the paper is also available here: (.pdf, 1.5M)
- Cule, M. and Samworth, R. (2010) Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electron. J. Stat., 4, 254-270. (.pdf, 200K)
- Shah, R. D. and Samworth, R. J. (2010) Invited discussion of Stability selection by Meinshausen and Bühlmann. J. Roy. Statist. Soc., Ser. B, 72, 455-456. (.pdf, 48K)
- Samworth, R. J. and Wand, M. P. (2010) Asymptotics and optimal bandwidth selection for highest density region estimation. Ann. Statist., 38, 1767-1792. (.pdf, 1.4M)
- Gramacy, R., Samworth, R. and King, R. (2010) Importance tempering. Statistics and Computing, 20, 1-7. (.pdf, 306K)
- Fan, J., Samworth, R. and Wu, Y. (2009) Ultrahigh dimensional feature selection: beyond the linear model. J. Machine Learning Research, 10, 2013-2038. (.pdf, 256K).
- Fan, J., Feng, Y., Samworth, R. and Wu, Y. (2009) SIS, An R package for (Iterative) Sure Independence Screening for generalized linear models and Cox's proportional hazards models, available from CRAN .
- Cule, M., Gramacy, R. B. and Samworth, R. (2009) LogConcDEAD: an R package for maximum likelihood estimation of a multivariate log-concave density. J. Statist. Software, 29, Issue 2.
- Hall, P., Park, B. U. and Samworth, R. J. (2008) Choice of neighbor order in nearest-neighbor classification. Ann. Statist., 36, 2135-2152. (.pdf, 191K). A longer version of the paper is also available here: (.pdf, 236K)
- Samworth, R. (2008) Invited discussion of Sure independence screening for ultra-high dimensional feature space by Fan and Lv. J. Roy. Statist. Soc., Ser. B , 70, 888-889. (.pdf, 84K).
- Cule, M., Gramacy, R., Samworth, R. and Chen, Y. (2007) LogConcDEAD, An R package for log-concave density estimation in arbitrary dimensions, version 1.4.2 available from CRAN .
- Samworth, R. and Gowland, R. (2007) Estimation of adult skeletal age-at-death: statistical assumptions and applications. International Journal of Osteoarchaeology, 17, 174-188. (.pdf, 200K)
- Poore, H. R., Samworth, R., White, N. J., Jones, S. M. and McCave, I. N. (2006) Neogene overflow of northern component water at the Greenland-Scotland ridge. Geochem. Geophys. Geosyst., 7, Q06010, doi:10.1029/2005GC001085. (.pdf, 2.5M)
- Samworth, R. and Poore, H. (2005) Understanding past ocean circulations: a nonparametric regression case study. Statistical Modelling , 5, 289-307. (.pdf, 1.7M)
- Johnson, O. and Samworth, R. (2005) Central Limit Theorem and convergence to stable laws in Mallows distance. Bernoulli, 11, 829-845. (.pdf, 176K)
- Samworth, R. (2005) Small confidence sets for the mean of a spherically symmetric distribution. J. Roy. Statist. Soc., Ser. B, 67, 343-361. (.pdf, 512K)
- Hall, P. and Samworth, R. J. (2005) Properties of bagged nearest-neighbour classifiers. J. Roy. Statist. Soc., Ser. B, 67, 363-379. (.pdf, 300K).
- Samworth, R. J. (2004) Some mathematical and theoretical aspects of the bootstrap. Ph.D. thesis, University of Cambridge. (.pdf, 1.5M)
- Samworth, R. (2003) A note on methods of restoring consistency to the bootstrap. Biometrika, 90, 985-990. (.pdf, 164K)
- Samworth, R. J. (2014) New challenges in high-dimensional statistical inference. Poster. (.pdf, 684K)
- Samworth, R. and Johnson, O. (2005) The empirical process in Mallows distance, with application to goodness-of-fit tests. Preprint. (.pdf, 296K)
- Samworth, R. J. (2004) Some asymptotic results for the bootstrap distribution of the sample mean. Preprint . (.pdf, 228K)
- Samworth, R. J. (2003) Bootstrap diagnostics and inconsistency. Preprint . (.pdf, 280K)
- Samworth, R. (2000) Shrinkage Estimators , Part III Essay, University of Cambridge. (.pdf, 288K)
Selected recent talks
- Log-concavity: New theory and methodology (Berlin, January 2013) (.pdf, 392K)
- Log-concave density estimation with applications (Lund, September 2012) (.pdf, 244K)
- High-dimensional variable selection in Statistics (Cambridge, September 2012) (.pdf, 268K)
- Independent component analysis via nonparametric maximum likelihood estimation (Istanbul, July 2012) (.pdf, 256K)
- Variable selection with error control: Another look at Stability Selection (Tsukuba, July 2012) (.pdf, 256K)
- Optimal weighted nearest neighbour classifiers (Essex, May 2012) (.pdf, 256K)