Teaching - Modern Statistical Methods

The field of statistics has undergone profound changes in recent decades. Firstly, the types of datasets that statisticians are asked to analyse have transformed dramatically. In the past, we typically dealt with datasets containing many observations and a modest number of carefully chosen variables. Today, by contrast, it is common to encounter datasets with thousands of variables - sometimes even far exceeding the number of observations. For instance, in genomics, we might measure the expression levels of several thousand genes but only across a few hundred tissue samples. Classical statistical methods are often simply not applicable in these "high-dimensional" settings. As the scale of data collection has expanded, so too has the scope of the questions we seek to answer. Whereas statistics was once primarily concerned with uncovering associations between variables, we are now increasingly interested in understanding the causal structure of data. And rather than focusing solely on prediction, we often aim to predict the effects of interventions. At the same time, the rapid rise of machine learning has provided us with powerful new tools. In this course, we will explore how these advances can be harnessed to tackle some of the modern statistical challenges outlined above. The selection of material is heavily biased towards my own interests, but I hope it will nevertheless give you a flavour of some of the most important recent methodological developments in statistics.

Resources

Course notes.

Old course notes.

Even older course notes.

Slides on basic asymptotic statistics.

The Elements of Statistical Learning (T. Hastie, R. Tibshirani and J. Friedman) has excellent background material for large parts of this course, presented in a less mathematical style.

Statistics for High-Dimensional Data (P. Bühlmann and S. van de Geer) covers much of our course and in many places goes into much greater depth than we do.

High-Dimensional Statistics (M. J. Wainwright) covers most of our course in greater depth, and is a great reference if you are continuing studies in this area.

Notes on the theory of RKHS (D. Sejdinovic and A. Gretton) gives an excellent detailed treatment of the theory of RKHS's.

Advanced data analysis from an elementary point of view (C. Shalizi) - chapter 20 and the whole of part IV provides some nice background reading for the part of the course on graphical models and causal inference.

The Elements of Causal Inference (J. Peters, D. Janzing and B. Schölkopf) is highly recommended if you want to learn more about causal inference.

Some preliminary material prepared for another course may be helpful as a source of basic background material on linear algebra.

Review of conditional expectations (Section 1.1)

Part III statistics preparation resources.

Code for Demonstrations

The code for the demonstrations is written in R. Rstudio is a useful editor for R. Here are some introductory worksheets on R: Sheet 1, (solutions); Sheet 2, (solutions). The code for the demonstrations is given below.

Teaching - Modern Statistical Methods

Resources

Code for Demonstrations

Example Sheets

Comments and Questions