Qingyuan Zhao

Professor of Statistics

University of Cambridge

About

I am a Professor of Statistics in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics (DPMMS) at University of Cambridge, a Fellow of the Corpus Christi College, and an Associate Faculty of the Cambridge Centre for AI in Medicine (CCAIM).

My research interests lie primarily in drawing scientific conclusions about causal relationships using experimental and observational data, a fast-growing area known as “causal inference”. More broadly, I would like to understand how “design”—a principle I view as fundamental yet elusive in statistics—shapes the practice of statistical applications in biomedical and social sciences.

Click here for a bio-sketch in the third person narrative.

Interests

Causal inference
Selective inference
Applied statistics

Education

PhD in Statistics, 2016

Stanford University
BSc in Mathematics, 2011

University of Science and Technology of China (USTC)

News

<2024-10-01 Tue> I am promoted to Professor of Statistics.
<2024-09-23 Mon> I am joining the Associate Editor Board of Statistical Science.
If you have applied statistics questions, you might be interested in the Statistics Clinic that offers free consulting to University members. I am a regular consultant in the Clinic.
If you are interested in causal inference, you might want to check out the weekly Online Causal Inference Seminar.

Recent talks

On statistical and causal models associated with acyclic directed mixed graphs

2025-01-14 4:30 PM — 5:30 PM

Acyclic Directed Mixed Graphs: Matrix Algebra, Statistical Models, Confounder Selection

2024-09-26 2:00 PM — 3:00 PM TU Munich, Germany

Causal Perspectives on 'Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models' by Apley and Zhu

2024-09-23 4:00 PM — 5:00 PM Online

See all talks

Home

News

<2025-06-01 Sun> I will be co-organizing a long-residency program on causal inference at the Isaac Newton Institute from January to June, 2026.
<2024-10-01 Tue> I am promoted to Professor of Statistics.
<2024-09-23 Mon> I am joining the Associate Editor Board of Statistical Science.
If you have applied statistics questions, you might be interested in the Statistics Clinic that offers free consulting to University members. I am a regular consultant in the Clinic.
If you are interested in causal inference, you might want to check out the weekly Online Causal Inference Seminar.

Bio

Bio sketch

Qingyuan was born and raised in Wuhan in central China, a city known for its many lakes and rivers and rich cultural heritage. After high school, he went to the Special Class for the Gifted Young in University of Science and Technology of China and majored in mathematics. He then went to Stanford University for postgraduate studies and obtained his Ph.D. in Statistics in 2016. He spent three years in the Wharton School of University of Pennsylvania as a postdoctoral fellow before joining the Statistical Laboratory in University of Cambridge as a University Lecturer in 2019. He was promoted to Professor of Statistics at University of Cambridge in 2024.

Qingyuan’s research interests lie primarily in drawing scientific conclusions about causal relationships using experimental and observational data, a fast-growing area known as “causal inference”. More broadly, he strives to understand how “design”—a principle he views as fundamental yet elusive in statistics—shapes the practice of statistical applications in biomedical and social sciences.

Teaching

TODO Causal Inference (Part III, Michaelmas 2019)

This is a 16-lecture course on causal inference, the statistical science of drawing causal conclusions from experimental and non-experimental data.

General information

Course syllabus.
Time: Tuesday & Thursday, 12-1pm.
Location: MR14.
Office hour: By appointment.
Prerequisites:
- Familiarity with undergraduate-level probability and statistics.
- An open mind to apply statistical theory to practical problems.
- Experience with R or other programming languages is helpful.

Course outline

Lecture notes will be provided after each lecture. I am not quite ready to make the lecture notes public, so email me if you miss the first lecture and need the username and password to access the notes.

Chapter	Topic	Last updated	Additional materials
Part I	Motivations
1	Principles of causal inference	<2019-10-16 Wed>
2	Randomised experiments	<2019-10-21 Mon>	2019 Nobel Prize in Economics
3	Linear structural equation models	<2019-10-23 Wed>	Excerpt from a psychology paper
Part II	Languages for causality
4	Probabilistic graphical models	<2019-11-06 Wed>
5	Nonparametric structural equations and counterfactuals	<2019-11-06 Wed>
6	Causal Identification	<2019-11-06 Wed>
Part III	Statistical methods
7	Matching and randomisation inference	<2019-11-14 Thu>
8	Semiparametric inference for average treatment effects	<2019-11-19 Tue>
9	Instrumental variables	<2019-11-25 Mon>
10	Regression discontinuity design	<2019-11-28 Thu>
11	Negative control	<2019-11-28 Thu>
12	Mediation analysis	<2019-12-02 Mon>
Part I—III	Full notes (Chapters 1–12)	<2020-01-21 Tue>

The full notes include corrections and clarifications to the in-class notes and solution or hint to some exercises.

Readings

Causal Inference for Statistics, Social, and Biomedical Sciences by Guido Imbens and Donald Rubin.
Causality: Models, Reasoning, and Inference by Judea Pearl.
Statistical Models: Theory and Practice by David Freedman.
Graphical Models by Steffen Lauritzen.
Observational Studies by Paul Rosenbaum.
Causal Inference by Miguel Hernán and James Robins.
Mostly Harmless Econometrics: An Empiricist’s Companion by Joshua Angrist and Jörn-Steffen Pischke.
Linear models: A useful “microscope” for causal analysis by Judea Pearl.
Single World Intervention Graphs by Thomas Richardson and James Robins.

Example classes

Location: MR11.
Time: 2-3:30.

Time	Sheet	Last updatd	Solution
<2019-10-28 Mon>	Example Sheet 1; Dataset; SEM paper	<2019-10-25 Fri>	Mostly in the full lecture notes; R code for Q5
<2019-11-11 Mon>	Example Sheet 2	<2019-11-06 Wed>	See the full lecture notes.
<2019-12-02 Mon>	Example Sheet 3	<2019-11-25 Mon>	See the full lecture notes.
<2020-01-13 Mon>	Reading materials; See below	<2020-01-08 Wed>
<2020-05-11 Mon>	Revision	<2020-05-11 Mon>	Revision notes; Video recording.

Presentation of applied research articles

The 4th example class will be interactive. The lecturer will provide a list of applied articles (in social sciences, public health, and other areas) before the 3rd example class. Each student will then be asked to pick an article and give a short presentation in the final example class.

It is intended that all students wishing to take the exam of this course can participate in this example class. Please let me know if you have trouble attending this class. More information will be provided during the Michaelmas term.

A tentative list of applied articles (being updated):

Political intolerance and political repression during the McCarthy Red Scare (reprinted in David Freedman’s book, page 315–342) and Freedman’s comments (Section 6.3).
Finishing high school and starting college: Do Catholic schools make a difference (reprinted in David Freedman’s book, page 343–376) and Freedman’s comments (Section 7.4).
Education and fertility: Implications for the roles women occupy (reprinted in David Freedman’s book, page 377–401) and Freedman’s comments (Section 9.5).
Institutional arrangements and the creation of social capital: The effects of public school choice (reprinted in David Freedman’s book, page 402–430) and Freedman’s comments (Section 9.7).
A new perspective on John Snow’s communicable disease theory (a short version can be found in David Freedman’s book, Section 1.3) and Sociomedical indicators in the cholera epidemic in Ferrara of 1855 (also this editorial and commentary). These articles may be suitable for two presenters, one focusing on John Snow’s analysis and one on the Ferrara study.
Thomas Cook’s commentary to Cochran (page 140–163, 1st Volume of Observational Studies) has three examples demonstrating the limitations of observational studies that seek to mimic randomised experiments (Page 146: Issue 1; Page 153: Issue 2; Page 157: Issue 3). This article may be suitable for up to three presenters.
Triangulation in aetiological epidemiology has three illustrative examples for corroboration of evidence from different causal inference approaches. Some of the results are in the supplementary materials and can be downloaded from the IJE website. This article may be suitable for up to three presenters.
Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence?
Global warming is anthropogenic (Section 3.1 of this book by Fred Bookstein).

TODO Statistical Modelling (Part II, Michaelmas 2020)

General information

This course consists of 16 lectures and 8 practical sessions. It complements the Part II Principles of Statistics, but takes a more applied perspective.
Course schedule (page 25).
Prerequisites: Part IB Statistics.
Due to the pandemic, all the lectures and practicals will be online. Videos and handwritten notes will be made available through this webpage. I won’t be providing LaTeX notes but will try to follow the notations in last year’s lecture notes.
Please email me or leave a comment below if you find any mistakes or have any questions.
Recordings can be found in Moodle.

Lectures

Number	Date	Topic	Optional Reading
Part 1		Linear models
L1	<2020-10-08 Thu>	Scope of the course, least squares and its geometry	Agresti 2.1–2.4
L2	<2020-10-13 Tue>	Gauss-Markov, exact inference under normality	Agresti 2.7, 3.1–3.3
L3	<2020-10-15 Thu>	Heteroscedasticity, diagnostics, robust regression	Agresti 2.5
L4	<2020-10-20 Tue>	Model misspecification, bias-variance tradeoff	ISLR 2.2.1–2.2.2
L5	<2020-10-22 Thu>	Simpson’s paradox, model selection	ISLR 6.1–6.2
L6	<2020-10-27 Tue>	Review of linear models, likelihood asymptotics	2019 notes 2.5.1–2.5.3
L7	<2020-10-29 Thu>	Delta method; From LM to GLM	Common distributions
L8	<2020-11-03 Tue>	Properties of exponential families	Efron’s notes on empirical Bayes
L9	<2020-11-05 Thu>	Conjugate priors, MLE, Deviance
L10	<2020-11-10 Tue>	Deviance residuals, exponential dispersion families	2019 notes 2.3
L11	<2020-11-12 Thu>	GLMs: MLE	Agresti 4.1–4.3
L12	<2020-11-17 Tue>	GLMs: Analysis of deviance, computation	Agresti 4.4–4.5
L13	<2020-11-19 Thu>	GLMs: Model selection, diagnostics, binomial models	Agresti 4.4, 4.6
L14	<2020-11-24 Tue>	Poisson GLMs; Multinomial model and Poisson trick	Agresti 7.1, 7.2
L15	<2020-11-26 Thu>	Contigency tables and independence
L16	<2020-12-01 Tue>	Reivew and look forward

Full notes (to L12): PDF (181MB); GoodNotes (146MB);

Practicals

Number	Date	Topic	Optional Reading
P1	<2020-10-10 Sat>	Basic R; Solution	CRAN Intro to R 1,2,5,8
P2	<2020-10-17 Sat>	Writing functions, linear models; Code; Solution	CRAN Intro to R 6, 10
P3	<2020-10-24 Sat>	Linear models; Code	CRAN Intro to R 11.1–3
P4	<2020-10-31 Sat>	Model selection; Code; Solution
P5	<2020-11-07 Sat>	ANOVA and ANCOVA; Code; Solution	2019 notes 1.2.5
P6	<2020-11-14 Sat>	Binomial GLMs; Code; Solution
P7	<2020-11-21 Sat>	Binomial and Poisson GLMs; Code; Solution
P8	<2020-11-28 Sat>	Contigency tables and Gamma GLMs; Code; Solution	Agresti 4.7

Example sheets

Readings

Theory for LM and GLM
- A. Agresti. Foundations of Linear and Generalized Linear Models. Wiley 2015. [Agresti]
- G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning (with Applications in R). Springer 2013. [ISLR]
- Last year’s lecture notes.
- Prof Richard Weber’s notes for IB Statistics.
- Prof Brad Efron’s notes on exponential families ( I, II) and generalised linear models ( III). (These notes are quite advanced and are only for the most ambitious students.)

R and statistical computing
- W. N. Venables, D. M. Smith and the R Core Team. An Introduction to R.

TODO Causal Inference (Part III, Michaelmas 2020)

This is a 16-lecture course on causal inference, the statistical science of drawing causal conclusions from experimental and non-experimental data.

General information

Course syllabus.
Time: Tuesday & Thursday, 11am–12.
Location: Live-stream via Zoom (link available in Moodle).
Office hour: I will stay on Zoom after each lecture to answer questions. I would also like to chat with every Part III student who is taking this course. Please sign up here for a 20 minute slot.
Please email me if you find any mistakes or have any suggestions.

Lectures

Number	Date	Topic	Optional Reading
Part 1		Motivations
L1	<2020-10-08 Thu>	Principles of causal inference	Pearl Epilogue
L2	<2020-10-13 Tue>	Potential outcomes and Neyman’s inference	IR 1, 4, 6
L3	<2020-10-15 Thu>	Randomisation test, regression adjustment	IR 5, 7
L4	<2020-10-20 Tue>	Regression adjustment; Linear SEM and path analysis	Pearl 5.1
L5	<2020-10-22 Thu>	Path analysis, correlation versus causation	Review paper by Pearl
L6	<2020-10-27 Tue>	Identification and estimation in linear SEMs	A psychology paper
L7	<2020-10-29 Thu>	Graphical models and Markov properties
L8	<2020-11-03 Tue>	Structure discovery; Nonparametric SEMs	Talk on SWIGs
L9	<2020-11-05 Thu>	Single world intervention graphs; g-formula
L10	<2020-11-10 Tue>	Causal identification	HR 6
L11	<2020-11-12 Thu>	No unmeasured confounders: Randomisation inference	SSRMP tutorial slides
L12	<2020-11-17 Tue>	Sensitivity analysis; Intro to semiparametric inference	Review paper by Kennedy
L13	<2020-11-19 Thu>	No unmeasured confounders: Semiparametric inference
L14	<2020-11-24 Tue>	Doubly robust estimator; Leveraging specificity
L15	<2020-11-26 Thu>	Instrumental variables
L16	<2020-12-01 Tue>	Mediation analysis

Full Lecture notes (Last updated: December 16, 2020).

Lecture recordings can be found in Moodle.

Example classes

Time: 13:30–15:00 on 28 October, 18 November, 2 December.
Location: Zoom.
Please submit your work for marking through Moodle.
Example sheet 1.
Example sheet 2.
Example sheet 3.
Example class 4: Present an applied article (follow this link).
Sample exam questions.

Readings

The following books/articles are optional. I am providing a short (personal) verdict to help you navigate the literature.

Causal Inference for Statistics, Social, and Biomedical Sciences by Guido Imbens and Donald Rubin [IR]. This book provides a gentle introduction to potential outcomes and statistical methods for simple randomised experiments and observational studies with no unmeasured confounders.
Causal Inference: What If by Miguel Hernán and James Robins [HR]. This book provides a comprehensive treatment for causal inference without and with models.
Causality: Models, Reasoning, and Inference by Judea Pearl [Pearl]. A great book if you are interested in the philosophical debates in causal inference.
Statistical Models: Theory and Practice by David Freedman. A less technical textbook is well suited for someone who wants to learn the basic ideas in causal inference through practical examples.
Graphical Models by Steffen Lauritzen. A good reference for probabilistic graphical models.
Observational Studies by Paul Rosenbaum. A good book for randomisation inference and sensitivity analysis.
Mostly Harmless Econometrics: An Empiricist’s Companion by Joshua Angrist and Jörn-Steffen Pischke. Very clearly written book from an applied econometrics point of view, with a lot of useful intuitions.

Statistical Modelling (Part II, Michaelmas 2021)

General information

This course consists of 16 lectures and 8 practical sessions. It complements the Part II Principles of Statistics, but takes a more applied perspective.
Prerequisites: Part IB Statistics.
Location: MR5.
Please email me or leave a comment below if you find any mistakes or have any questions.
Lectures will be recorded and the recordings can be found on Moodle.

Lectures

By Chapter

Combined

Lecture Notes on Statistical Modelling.

Practicals

Number	Date	Topic	Optional Reading
P1	<2021-10-9 Sat>	Basic R; Solution	CRAN Intro to R 1,2,5,8
P2	<2021-10-16 Sat>	Writing functions, linear models; Code; Solution	CRAN Intro to R 6, 10
P3	<2021-10-23 Sat>	Linear models; Code	CRAN Intro to R 11.1–3
P4	<2021-10-30 Sat>	Model selection; Code; Solution
P5	<2021-11-06 Sat>	ANOVA and ANCOVA; Code; Solution	2019 notes 1.2.5
P6	<2021-11-13 Sat>	Binomial GLMs; Code; Solution
P7	<2021-11-20 Sat>	Binomial and Poisson GLMs; Code; Solution
P8	<2021-11-27 Sat>	Contigency tables and Gamma GLMs; Code; Solution	Agresti 4.7

Example sheets

Readings

Theory for LM and GLM
- Lecture notes from 2019.
- A. Agresti. Foundations of Linear and Generalized Linear Models. Wiley 2015.
- G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning (with Applications in R). Springer 2013.
- Prof Richard Weber’s notes for IB Statistics.

R and statistical computing
- W. N. Venables, D. M. Smith and the R Core Team. An Introduction to R.
- H. Wickham. Advanced R (for anyone who wants to really understand R as a programming language).

Causal Inference (Part III, Michaelmas 2021)

This is a 16-lecture course on causal inference, the statistical science of drawing causal conclusions from experimental and non-experimental data.

General information

Course syllabus.
Office hour: Wednesday at 2pm (if there is no example class) @ CMS, D1.01.
Please email me or leave a comment below if you find any mistakes or have any questions.

Lectures

Last year’s lecture notes.
Corrected lecture notes.
Lectures will be recorded and the recordings can be found on Moodle.

Example classes

Time: 13:45–15:15 on 3 November, 17 November, 1 December, 19 January.
Instructor: Tobias Freidling, who also provided the solutions below.
Location: MR3.

Example sheet 1 ( Solution).
Example sheet 2 ( Solution).
Example sheet 3 ( Solution).
Example class 4: Present an applied article in here.
Sample exam questions.

Readings

The following books/articles are optional. I am providing a short (personal) verdict to help you navigate the literature.

Causal Inference for Statistics, Social, and Biomedical Sciences by Guido Imbens and Donald Rubin [IR]. This book provides a gentle introduction to potential outcomes and statistical methods for simple randomised experiments and observational studies with no unmeasured confounders.
Causal Inference: What If by Miguel Hernán and James Robins [HR]. This book provides a comprehensive treatment for causal inference without and with models.
Causality: Models, Reasoning, and Inference by Judea Pearl [Pearl]. A great book if you are interested in the philosophical debates in causal inference.
Statistical Models: Theory and Practice by David Freedman. A less technical textbook is well suited for someone who wants to learn the basic ideas in causal inference through practical examples.
Graphical Models by Steffen Lauritzen. A good reference for probabilistic graphical models.
Observational Studies by Paul Rosenbaum. A good book for randomisation inference and sensitivity analysis.
Mostly Harmless Econometrics: An Empiricist’s Companion by Joshua Angrist and Jörn-Steffen Pischke. Very clearly written book from an applied econometrics point of view, with a lot of useful intuitions.

TODO Introduction to Causal Inference (MPhil in Population Health, Lent 2022)

Click here for the theory slides.

Click here for the practical below in PDF format.

Randomization in design and analysis

Randomized controlled trials (RCTs) are widely regarded as the “gold standard” of establishing causality. The often forgotten component of the RCTs is that they can be objectively analyzed by randomization test. Haines and coauthors investigated the impact of disinvestment from weekend allied health services. We will use their dataset to explore the concept of randomization in the design and analysis of an experiment.

[Group] Skim through the abstract and read the section called “Design” of their article. Then answer the following questions: What is the name of the design of the experiment in this study? How was it carried out?
Download the patient-level data, then run the following code in R (you may need to install the readxl package first by install.packages("readxl")). What does the second line do?
```
data <- readxl::read_excel("S2 Data.xlsx")
data <- subset(data, hospital == "Dandenong" & study1 == 1)
```
Unfortunately, this dataset is not very well annotated. The columns index_ward and sw_step contain the identifiers for hospital ward and time step (in calendar month), respectively. In which order do you think the wards crossed over to no weekend health services? You may find the following R code useful.
```
table(data[, c("index_ward", "sw_step", "no_we_exposure")])
```
Construct a vector called cross_over_realized that contains the calendar month in which the $6$ hospital wards crossed over. Then use the following code to define the treatment and outcome of interest (“los” is short for length of stay).
```
data$treatment_status <- as.numeric(data$sw_step >= cross_over_realized[data$index_ward])
data$log_acute_los <- log(data$acute_los)
```
[Group] Execute the following code in your R session. Then comment on the two interval estimators of the treatment effect (of no weekend health services on log length of stay).
```
confint(lm(log_acute_los ~ treatment_status, data))
confint(lm(log_acute_los ~ treatment_status + as.factor(index_ward), data))
```
[Group] Next, we explore the randomization analysis of this dataset. First, use potential outcomes to define the null hypothesis that stopping weekend health services has no effect whatsoever. Notice that the treatment is not the same as the variable randomized in the experiment (crossover order). What assumption do you incurred while defining your null hypothesis? Give an example in which this assumption is not satisfied.

Read the following code, then execute it in your R session (you may need to install the package combinat which contains a function permn that generates all the permutations of a vector). For your reference, the expected output is included.

get_statistic <- function(index_ward,
                          sw_step,
                          log_acute_los,
                          cross_over) {
  treatment_status <- sw_step >= cross_over[index_ward]
  c(lm(log_acute_los ~ treatment_status)$coef[2],
    lm(log_acute_los ~ treatment_status + as.factor(index_ward))$coef[2])
}

T_obs <- get_statistic(data$index_ward, data$sw_step,
                       data$log_acute_los, cross_over_realized)

T_random <- sapply(combinat::permn(2:7),
                   get_statistic,
                   index_ward = trial1$index_ward,
                   sw_step = trial1$sw_step,
                   log_acute_los = trial1$log_acute_los)

par(mfrow = c(2, 1))
for (m in 1:2) {
  hist(T_random[m, ], 20,
       main = paste0("Randomization distribution (model ", m, "): ",
                     "p-value = ", signif(mean(T_random[m, ] >= T_obs[m]), 2)),
       xlab = "Test statistic", xlim = range(T_random))
  abline(v = T_obs[m], col = "red")
}

[Group] Explain what the code above does and discuss the results. Here are some points you may consider
- How do the two randomization tests compare with each other?
- How do the randomization tests compare with the normal linear model? How would you interpret their results?
- The randomization distribution of the second test statistic is clearly not centered at 0. Why?
- How can you “invert” the randomization tests to obtain an interval estimator of the treatment effect?

[Group] Causal diagrams and causal identification

In this group exercise, we will read the article titled “A Note on Posttreatment Selection in Studying Racial Discrimination in Policing”.

Read the section “Review”. Using Figure 1, explain the causal inference problem under investigation. Why do the authors say “the naive treatment effect $\Delta$ [in Equation 1] can be quite misleading when used to represent the causal effect of race on police violence”? Hint: $M$ is a collider.
Use your own words to explain Assumption 1.

You may skip the section “Average treatment effects conditional on the mediator”.

Read the first half of the section “A new estimator for the causal risk ratio”, then use your own words to explain Equation 3. Can we use the police admin data to estimate the “bias factor” in this equation?
Read the first three paragraphs in “A reanalysis of the NYPD stop-and-frisk dataset”, then use your own words to explain the results in Table 1. Use Equation 3 and Figure 2 to explain the large discrepancy between the naive and adjusted estimators in Table 1.

TODO Causal Inference with Observational Data: Common Designs and Statistical Methods

Course description

Observational studies are non-interventional empirical investigations of causal effects and are playing an increasingly vital role in healthcare decision making in the era of data science. The study design is particularly important in planning observational studies due to the lack of randomization. Aspects of design include defining the objectives and context under investigation, collecting the right data, and choosing suitable strategies to remove bias from measured and unmeasured confounders. Statistical analysis should also align with the design.

This module covers key concepts and useful methods for designing and analyzing observational studies. The first part of the module will focus on matching and weighting methods for cohort and case-control studies for causal inference. Specific topics include basic tools of matching and weighting, randomization inference, and sensitivity analysis. The second part of the module will focus on methods to address unmeasured confounding via causal exclusion. Specific topics include instrumental variables, negative controls, and difference-in-differences. Participants will also gain practical experience by applying these methods to real datasets using R.

Target audiences for this module are:

clinical researchers who need to use observational data to generate evidence of causality;
biostatisticians who are interested in understanding how causal inference can be reliably made in practice.

Background in statistical inference and some knowledge of R are recommended.

General information

Instructors: Ting Ye, Qingyuan Zhao.
Teaching assistant: Marlena Bannick.
Time: July, 25-27, 2022.
SISCER page.
You should have access to the Slack channel for this module. If not, please contact us.
Lectures will be delivered via Zoom and be recorded. The recordings will be posted on the course website when they are available. Practical sessions will not be recorded.

Teaching materials

Day 1: Randomization inference; Matching. ( Logistics; Lecture 1, recording; Lecture 2, recording; Practical 1, with Answer, recording)
Day 2: Weighting; Sensitivity analysis; Case-control design. ( Lecture 3, recording; Lecture 4, recording; Practical 2, data and R code for optmatch, solution, recording)
Day 3: Instrumental variables and Mendelian randomization; Negative control and difference-in-differences. ( Lecture 5, recording; Lecture 6, recording; Practical 3, solution; Practical 4, solution, recording)

Computing environment

Before the module starts, please ensure that you have installed the latest version of R. We also recommend you to use an integrated development environment like RStudio.

Project

Projects Home

IV & MR (project page) Instrumental-VariablesMendelian-Randomization

COVID-19 (project page) Infectious-Diseases

Randomization (project page) Randomization

Students

Postdocs

Katarzyna Reluga: 2020–2021.
Zijun Gao: 2022–2023.
Jieru (Hera) Shi: 2023–
Pan Zhao: 2024–

Ph.D. students

Matt Tudball (co-supervision): 2018–2023.
Tobias Freidling: 2020–2025.
Joakim Blach Andersen: 2021–2025.
Max Zhu: 2022–
Martina Scauda: 2023–

Undergraduate students

Etaash Katiyar: July–September, 2020.
Naomi Wei: July–September, 2021.
Thalia Seale: July–September, 2021.
Junshi Wang: June–August, 2022.
Timothée Foutot: April–July, 2023.

Qingyuan Zhao

Professor of Statistics

University of Cambridge

About

Interests

Education

News

Recent posts

Recent talks

Contact

Home

News

Bio

Bio sketch

Teaching

Teaching

TODO Causal Inference (Part III, Michaelmas 2019)

General information

Course outline

Readings

Example classes

Presentation of applied research articles

TODO Statistical Modelling (Part II, Michaelmas 2020)

General information

Lectures

Practicals

Example sheets

Readings

TODO Causal Inference (Part III, Michaelmas 2020)

General information

Lectures

Example classes

Readings

Statistical Modelling (Part II, Michaelmas 2021)

General information

Lectures

Practicals

Example sheets

Readings

Causal Inference (Part III, Michaelmas 2021)

General information

Lectures

Example classes

Readings

TODO Introduction to Causal Inference (MPhil in Population Health, Lent 2022)

Randomization in design and analysis

[Group] Causal diagrams and causal identification

TODO Causal Inference with Observational Data: Common Designs and Statistical Methods

Course description

General information

Teaching materials

Computing environment

Project

Projects Home

IV & MR (project page) Instrumental-VariablesMendelian-Randomization

COVID-19 (project page) Infectious-Diseases

Randomization (project page) Randomization

Students

Students

Postdocs

Ph.D. students

Undergraduate students

Links

Useful links

Maths & Stats

Computing