Research

My main research interests are in nonparametric and high-dimensional statistics. Particular topics include shape-constrained estimation problems; data perturbation methods (e.g. subsampling, bootstrap sampling, random projections, knockoffs); nonparametric classification; unconditional and conditional independence testing; estimation of entropy and other functionals; changepoint detection and estimation; missing data; variable selection; and applications, including genetics, archaeology and oceanography. Some general articles and a video about my research can be found here, here and here.

Publications and Preprints

- Berrett, T. B., Kontoyiannis, I. and Samworth, R. J. (2020) Optimal rates for independence testing via
*U*-statistic permutation tests.*Preprint*. (.pdf 569K)

- Janková, J., Shah, R. D., Bühlmann, P. and Samworth, R. J. (2019) Goodness-of-fit testing in high-dimensional generalized linear models.
*Preprint*. (.pdf 964K).

- Liu, H., Gao, C. and Samworth, R. J. (2019) Minimax rates in sparse, high-dimensional changepoint detection.
*Preprint*. (.pdf 700K).

- Zhu, Z., Wang, T. and Samworth, R. J. (2019) High-dimensional principal component analysis with heterogeneous missingness.
*Preprint*. (.pdf 619K). The accompanying**R**package**primePCA**is available from CRAN.

- Dümbgen, L., Samworth, R. J. and Wellner, J. A. (2019) Bounding distributional errors via density ratios.
*Preprint*. (.pdf 386K)

- Berrett, T. B. and Samworth, R. J. (2019) Efficient two-sample functional estimation and the super-oracle phenomenon.
*Preprint*. (.pdf 619K)

- Xu, M. and Samworth, R. J. (2019) High-dimensional nonparametric density estimation via symmetry and shape constraints.
*Preprint*. (.pdf 1.7M)

- Feng, O., Guntuboyina, A., Kim, A. K. H. and Samworth, R. J. (2020) Adaptation in multivariate log-concave density estimation.
*Ann. Statist.*, to appear. (.pdf 803K).

- Gataric, M., Wang, T. and Samworth, R. J. (2020) Sparse principal component analysis via axis-aligned random projections.
*J. Roy. Statist. Soc., Ser B*, to appear. (.pdf 1.1M). The accompanying**R**package**SPCAvRP**is available from CRAN.

- Berrett, T. B., Wang, Y., Barber, R. F. and Samworth, R. J. (2020) The conditional permutation test for independence while controlling for confounders.
*J. Roy. Statist. Soc., Ser B*, to appear. (.pdf 404K).

- Cannings, T. I., Fan, Y. and Samworth, R. J. (2020) Classification with imperfect training labels.
*Biometrika*, to appear. (.pdf 576K).

- Cannings, T. I., Berrett, T. B. and Samworth, R. J. (2020) Local nearest neighbour classification with applications to semi-supervised learning.
*Ann. Statist.*, to appear. (.pdf 412K).

- Barber, R. F., Candès, E. J. and Samworth, R. J. (2020) Robust inference with knockoffs.
*Ann. Statist.*, to appear. (.pdf 386K).

- Yu, Y., Bradic, J. and Samworth, R. J. (2020) Confidence intervals for high-dimensional Cox models.
*Statist. Sinica, to appear*. (.pdf 932K).

- Han, Q., Wang, T., Chatterjee, S. and Samworth, R. J. (2019) Isotonic regression in general dimensions.
*Ann. Statist.*,**47**, 2440-2471. (.pdf 306K). The online supplementary material is available here: (.pdf, 377K).

- Berrett, T. B., Samworth, R. J. and Yuan, M. (2019) Efficient multivariate entropy estimation via
*k*-nearest neighbour distances.*Ann. Statist.*,**47**, 288-318. (.pdf 285K). The online supplementary material is available here: (.pdf, 462K).

- Berrett, T. B. and Samworth, R. J. (2019) Nonparametric independence testing via mutual information.
*Biometrika*,**106**, 547-566. (.pdf 540K). The accompanying**R**package**IndepTest**is available from CRAN.

- Mitchell, P. D., Brown, R., Wang, T., Shah, R. D., Samworth, R. J., Deakin, S., Edge, P., Hudson, I., Hutchinson, R., Stohr, K., Latimer, M., Natarajan, R., Qasim, S., Rehm, A., Sanghrajka, A., Tissingh, E. and Wright, G. (2019) Multi-centre study of non-accidental injury and limb fractures in young children in the East Anglia region, UK.
*Archives of Disease in Childhood*,**0**, 1-6 (.pdf, 519K).

- Kim, A. K. H., Guntuboyina, A. and Samworth, R. J. (2018) Adaptation in log-concave density estimation.
*Ann. Statist.*,**46**, 2279-2306. (.pdf 256K). The online supplementary material is available here: (.pdf, 327K).

- Wang, T. and Samworth, R. J. (2018) High dimensional change point estimation via sparse projection.
*J. Roy. Statist. Soc., Ser. B*,**80**, 57-83. (.pdf 1.5M). The accompanying**R**package**InspectChangepoint**is available from CRAN.

- Samworth, R. J. (2018) Recent progress in log-concave density estimation.
*Statist. Sci.*,**33**, 493-509. (.pdf 832K).

- Sen, B. and Samworth, R. J. (2018) Editorial: Special issue on ``Nonparametric inference under shape constraints''.
*Statist. Sci.*,**33**, 469-472. (.pdf 72K).

- Banerjee, M. and Samworth, R. J. (2018) A conversation with Jon Wellner.
*Statist. Sci.*,**33**, 633-651. (.pdf 5.5M)

- Bødker, J. S., Brøndum, R. F., Schmitz, A., Schönherz, A. A., Jespersen, D. S., Sønderkær, M., Vesteghem, C., Due, H., Nøgaard C. H., Perez-Andres, M., Samur, M. K., Davies, F., Walker, B., Pawlyn, C., Kaiser, M., Johnson, D., Bertsch, U., Broyl, A., van Duin, M., Shah, R., Johansen, P., Nøgaard, M. A., Samworth, R. J., Sonneveld, P., Goldschmidt. H., Morgan, G. J., Orfao, A., Munshi, N., El-Galaly, T., Dybkær, K. and Bøgsted, M. (2018) A multiple myeloma classification system that associates normal B-cell subset phenotypes with prognosis.
*Blood Advances*,**2**, 2400-2411. (.pdf, 2.4M).

- Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification.
*J. Roy. Statist. Soc., Ser. B (with discussion)*,**79**, 959-1035. (.pdf 1.8M). The accompanying**R**package**RPEnsemble**is available from CRAN.

- Lockhart, R. A. and Samworth, R. J. (2017) Comments on `High-dimensional simultaneous inference with the bootstrap' by R. Dezeure, P. Bühlmann and C.-H. Zhang.
*TEST*,**26**, 734-739. (.pdf 228K).

- Kim, A. K. H. and Samworth, R. J. (2016) Global rates of convergence in log-concave density estimation.
*Ann. Statist.*,**44**, 2756-2779. (.pdf 214K). The online supplementary material is available here: (.pdf, 1.1M).

- Wang, T., Berthet, Q. and Samworth, R. J. (2016) Statistical and computational trade-offs in estimation of sparse principal components.
*Ann. Statist.*,**44**, 1896-1930. (.pdf, 500K). The online supplementary material is available here: (.pdf, 368K).

- Chen, Y. and Samworth, R. J. (2016) Generalised additive and index models with shape constraints.
*J. Roy. Statist. Soc., Ser. B*,**78**, 729-754. (.pdf, 404K). The accompanying**R**package**scar**, short for**s**hape**c**onstrained**a**dditive**r**egression, is available from CRAN.

- Samworth, R. J. (2016) Peter Hall's work on high-dimensional data and classification.
*Ann. Statist.*,**44**, 1888-1895. (.pdf 312K)

- Yu, Y., Wang, T. and Samworth, R. J. (2015) A useful variant of the Davis–Kahan theorem for statisticians.
*Biometrika*,**102**, 315-323. (.pdf, 188K)

- Dybkær, K., Bøgsted, M., Falgreen, S., Bødker, J. S., Kjeldsen, M. K., Schmitz, A., Bilgrau, A. E., Xu-Monette, Z. Y., Li, L., Bergkvist, K. S., Laursen, M. B., Rodrigo-Domingo, M., Marques, S. C., Rasmussen, S. B., Nyegaard, M., Gaihede, M., Møller, M. B., Samworth, R. J., Shah, R. D., Johansen, P., El-Galaly, T. C., Young, K. H. and Johnsen, H. E. (2015) A diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis.
*J. Clinical Oncology*,**33**, 1379-1388.

- Shah, R. D. and Samworth, R. J. (2015) Invited discussion of
*An adaptive resampling test for detecting the presence of significant predictors*by I. W. McKeague and M. Qian.*J. Amer. Statist. Assoc.*,**110**, 1439-1442 (.pdf, 604K).

- Samworth, R. J. (2014) Big Data: a new era for Statistics.
*The Eagle*, 43-46. (.pdf, 836K)

- Chen, Y., Shah, R. D. and Samworth, R. J. (2014) Invited discussion of
*Multiscale change point inference*by K. Frick, A. Munk and H. Sieling.*J. Roy. Statist. Soc., Ser. B*,**76**, 544-546. (.pdf 68K).

- Shah, R. D. and Samworth, R. J. (2013) Variable selection with error control: Another look at Stability Selection.
*J. Roy. Statist. Soc., Ser. B*,**75**, 55-80. DOI: 10.1111/j.1467-9868.2011.01034.x (.pdf, 1.1M). Some associated**R**code can be found here.

- Chen, Y. and Samworth, R. J. (2013) Smoothed log-concave maximum likelihood estimation with applications.
*Statist. Sinica*,**23**, 1373-1398. (.pdf, 500K)

- Shah, R. D. and Samworth, R. J. (2013) Invited discussion of
*Correlated variables in regression: clustering and sparse estimation*by P. Bühlmann, P. Rütimann, S. van de Geer and C.-H. Zhang.*J. Statist. Plann. Inf.*,**143**, 1866-1868. (.pdf 378K)

- Dümbgen, L., Samworth, R. J. and Schuhmacher, D. (2013) Stochastic search for semiparametric linear regression models. In
*From Probability to Statistics and Back: High-Dimensional Models and Processes -- A Festschrift in Honor of Jon A. Wellner. Eds M. Banerjee, F. Bunea, J. Huang, V. Koltchinskii, M. H. Maathuis*, pp. 78-90. (.pdf 224K).

- Yu, Y. and Samworth, R. J. (2013) Invited discussion of
*Large Covariance Estimation by Thresholding Principal Orthogonal Complements*by J. Fan, Y. Liao and M. Mincheva.*J. Roy. Statist. Soc., Ser. B.*,**75**, 656-658. (.pdf 364K)

- Samworth, R. J. and Yuan, M. (2012) Independent component analysis via nonparametric maximum likelihood estimation.
*Ann. Statist.*,**40**, 2973-3002. (.pdf, 556K)

- Samworth, R. J. (2012) Optimal weighted nearest neighbour classifiers.
*Ann. Statist.*,**40**, 2733-2763. DOI: 10.1214/12-AOS1049 (.pdf, 336K). Online supplement (.pdf, 308K). The optimal weighting scheme is implemented in the**R**packages 'FNN' (written by Shengqiao Li) and 'kknn' (written by Klaus Schliep), both available on CRAN .

- Samworth, R. J. (2012) Stein's Paradox.
*Eureka*,**62**, 38-41. (.pdf 608K)

- Samworth. R. J. (2011) Invited discussion of
*Adaptive confidence intervals for the test error in classification*by Laber and Murphy.*J. Amer. Statist. Assoc.*,**106**, 914-915 (.pdf, 88K).

- Dümbgen, L., Samworth, R. and Schuhmacher, D. (2011) Approximation by log-concave distributions with applications to regression.
*Ann. Statist.*,**39**, 702-730 (.pdf, 232K). A longer version of the paper is available here: (.pdf, 1.0MB)

- Cule, M., Samworth, R. and Stewart, M. (2010) Maximum likelihood estimation of a multi-dimensional log-concave density.
*J. Roy. Statist. Soc., Ser. B. (with discussion)*,**72**, 545-600. (.pdf, 3M). A longer version of the paper is also available here: (.pdf, 1.5M)

- Cule, M., Samworth, R. and Stewart, M. (2010) Rejoinder to
*Maximum likelihood estimation of a multi-dimensional log-concave density*.*J. Roy. Statist. Soc., Ser. B.*,**72**, 600-607. (.pdf, 116K)

- Cule, M. and Samworth, R. (2010) Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density.
*Electron. J. Stat.*,**4**, 254-270. (.pdf, 200K)

- Shah, R. D. and Samworth, R. J. (2010) Invited discussion of
*Stability selection*by Meinshausen and Bühlmann.*J. Roy. Statist. Soc., Ser. B*,**72**, 455-456. (.pdf, 48K)

- Samworth, R. J. and Wand, M. P. (2010) Asymptotics and optimal bandwidth selection for highest density region estimation.
*Ann. Statist.*,**38**, 1767-1792. (.pdf, 1.4M)

- Gramacy, R., Samworth, R. and King, R. (2010) Importance tempering.
*Statistics and Computing*,**20**, 1-7. (.pdf, 306K)

- Fan, J., Samworth, R. and Wu, Y. (2009) Ultrahigh dimensional feature selection: beyond the linear model.
*J. Machine Learning Research*,**10**, 2013-2038. (.pdf, 256K).

- Fan, J., Feng, Y., Samworth, R. and Wu, Y. (2009) SIS, An
**R**package for (Iterative) Sure Independence Screening for generalized linear models and Cox's proportional hazards models, available from CRAN .

- Cule, M., Gramacy, R. B. and Samworth, R. (2009) LogConcDEAD: an
**R**package for maximum likelihood estimation of a multivariate log-concave density.*J. Statist. Software*,**29**, Issue 2.

- Hall, P., Park, B. U. and Samworth, R. J. (2008) Choice of neighbor order in nearest-neighbor classification.
*Ann. Statist.*,**36**, 2135-2152. (.pdf, 196K). A longer version of the paper is also available here: (.pdf, 236K)

- Samworth, R. (2008) Invited discussion of
*Sure independence screening for ultra-high dimensional feature space*by Fan and Lv.*J. Roy. Statist. Soc., Ser. B*,**70**, 888-889. (.pdf, 84K).

- Cule, M., Gramacy, R., Samworth, R. and Chen, Y. (2007)
*LogConcDEAD*, An**R**package for log-concave density estimation in arbitrary dimensions, version 1.4.2 available from CRAN .

- Samworth, R. and Gowland, R. (2007) Estimation of adult skeletal age-at-death: statistical assumptions and applications.
*International Journal of Osteoarchaeology*,**17**, 174-188. (.pdf, 200K)

- Poore, H. R., Samworth, R., White, N. J., Jones, S. M. and McCave, I. N. (2006) Neogene overflow of northern component water at the Greenland-Scotland ridge.
*Geochem. Geophys. Geosyst.*,**7**, Q06010, doi:10.1029/2005GC001085. (.pdf, 2.5M)

- Samworth, R. and Poore, H. (2005) Understanding past ocean circulations: a nonparametric regression case study.
*Statistical Modelling*,**5**, 289-307. (.pdf, 1.7M)

- Johnson, O. and Samworth, R. (2005) Central Limit Theorem and convergence to stable laws in Mallows distance.
*Bernoulli*,**11**, 829-845. (.pdf, 176K)

- Samworth, R. (2005) Small confidence sets for the mean of a spherically symmetric distribution.
*J. Roy. Statist. Soc., Ser. B*,**67**, 343-361. (.pdf, 512K)

- Hall, P. and Samworth, R. J. (2005) Properties of bagged nearest-neighbour classifiers.
*J. Roy. Statist. Soc., Ser. B*,**67**, 363-379. (.pdf, 300K).

- Samworth, R. J. (2004)
*Some mathematical and theoretical aspects of the bootstrap*. Ph.D. thesis, University of Cambridge. (.pdf, 1.5M)

- Samworth, R. (2003) A note on methods of restoring consistency to the bootstrap.
*Biometrika*,**90**, 985-990. (.pdf, 164K)

- Samworth, R. J. (2014) New challenges in high-dimensional statistical inference.
*Poster*. (.pdf, 684K)

- Samworth, R. and Johnson, O. (2005) The empirical process in Mallows distance, with application to goodness-of-fit tests.
*Preprint*. (.pdf, 296K)

- Samworth, R. J. (2004) Some asymptotic results for the bootstrap distribution of the sample mean.
*Preprint*. (.pdf, 228K)

- Samworth, R. J. (2003) Bootstrap diagnostics and inconsistency.
*Preprint*. (.pdf, 280K)

- Samworth, R. (2000)
*Shrinkage Estimators*, Part III Essay, University of Cambridge. (.pdf, 288K)

Selected recent talks

*Log-concavity: New theory and methodology*(Berlin, January 2013) (.pdf, 392K)

*Log-concave density estimation with applications*(Lund, September 2012) (.pdf, 244K)

*High-dimensional variable selection in Statistics*(Cambridge, September 2012) (.pdf, 268K)

*Independent component analysis via nonparametric maximum likelihood estimation*(Istanbul, July 2012) (.pdf, 256K)

*Variable selection with error control: Another look at Stability Selection*(Tsukuba, July 2012) (.pdf, 256K)

*Optimal weighted nearest neighbour classifiers*(Essex, May 2012) (.pdf, 256K)