Teaching - Modern Statistical Methods


The remarkable development of computing power and other technology now allows scientists and businesses to routinely collect datasets of immense size and complexity. Most classical statistical methods were designed for situations with many observations and a few, carefully chosen variables. However, we now often gather data where we have huge numbers of variables, in an attempt to capture as much information as we can about anything which might conceivably have an influence on the phenomenon of interest. This dramatic increase in the number variables makes modern datasets strikingly different, as well-established traditional methods perform either very poorly, or often do not work at all.

Developing methods that are able to extract meaningful information from these large and challenging datasets has recently been an area of intense research in statistics, machine learning and computer science. In this course, we will study some of the methods that have been developed to study such datasets.


Announcements

The second examples class will be on Monday 20 November at 2pm in MR4. If you hand in work by Friday 17 November 1pm to my CMS pigeonhole I will mark answers to questions 2 and 10.

We will have additional lectures on Wednesday 22 November and Monday 27 November at 2pm in MR3.


Resources

  • The Elements of Statistical Learning (T. Hastie, R. Tibshirani and J. Friedman) has excellent background material for large parts of this course, presented in a less mathematical style.
  • Lecture notes on Causality (J. Peters) is highly recommended if you want to learn more about causal inference. Parts of our notes are based closely on this, though this goes into more depth and covers more topics.
  • Some preliminary material prepared for another course I teach may be helpful as a source of basic background material on linear algebra.

Code for Demonstrations

The code for the demonstrations is written in R. Rstudio is a useful editor for R. Here are some introductory worksheets on R: Sheet 1, (solutions); Sheet 2, (solutions). The code for the demonstrations is given below.


Example Sheets


Comments and Questions