Course information
This is a home page of information for Richard Weber's course of 16 lectures to third year Cambridge mathematics students in winter 2013, starting January 18, 2013 (Tue/Thu @ 9am in CMS meeting room 3). This material is provided for students, supervisors (and others) to freely use in connection with this course. A web page for the 2012 course home page is also still available.
Introduction
This is a course on optimization problems that are posed over time. Importantly we now add dynamic and stochastic elements that were not present in the IB Optimization course, and we consider what happens when some variables are imperfectly observed. Applications of this course are to be found in science, economics and engineering (e.g., "insects as optimizers", "planning for retirement", "finding a parking space", "optimal gambling" and "steering a space craft to the moon").
There is no other course that is an absolute prerequisite, although acquaintance with Probability, Markov chains and Lagrangian multipliers is helpful. However, you should not worry if you did not attend these courses, as there is really nothing you will need to know that cannot be picked up along the way. An important part of the course is to show you a range of applications. The course should appeal to those who liked the type of problems occurring in Optimization IB, Markov Chains IB and Probability IA.
The diagram of the trolley on wheels at the top of this page models the problem that a juggler has when she tries to balance a broom upright in one hand. In this course you will learn how to balance N brooms simultaneously, on top of one another, or perhaps side by side. Can you guess which of these two juggling trick is more difficult?
Course notes
I aim to make each lecture a self-contained unit on a topic, with notes of four A4 pages.
Here are the
course notes 2013, which will evolve as the course proceeds. They are based on the course notes 2012.
These notes include:
Table of Contents
Lectures 1 – 16
Slides used in Lecture 6: Multi-armed bandits and Gittins index
Index
The notes contain hyperlinks. If you click on an entry in the table of contents, or on a page number in the Index, you be taken to the appropriate page.
If you think you find a mistake in these notes, check that you have the most recent copy (as I may have already made a correction.) Otherwise, please let me know and I will incorporate that correction into the notes.
Course Blog
There is a course blog in which I will make a few comments after each lecture: to emphasize an idea, give a sidebar, correction (!), or answer an interesting question (that perhaps a student sends to me in email).
Example Sheets
There are 3 examples sheets, each containing 11 questions, in this single file: exsheetoc.pdf. You should receive a supervision on each examples sheet.
Comments, Discussion and Questions
This year I am organising these pages so that students can comment or ask questions on the discussion page, and also at the end of each of the blog posts. I will endeavour to answer questions. When you comment, you can do this anonymously if you wish. Just give any name and leave the email address box blank. You can also send me email with questions, suggestions or if you think you have discovered a mistake in the notes.
Feedback
This year's course will finish on Tuesday 12 March 2013. The written feedback questionnaire has been completed by __ students, and a further __ have completed the electronic form. If you have not yet given feedback you can do so using my equivalent on-line form. This will be sent to my email anonymously. After reading the responses, I will forward them to the Faculty Office.
Exam questions
Here is a file of all tripos questions in Optimization and Control for 2001-2012.
Here are past tripos questions 1995-2006: tripos.pdf
Some of these questions have amusing real-life applications.
Recommended books
- Bertsekas, D. P., Dynamic Programming. Prentice Hall, 1987. (Useful for Part I of the course.)
- Bertsekas, D. P., Dynamic Programming and Optimal Control, Volumes I and II, Prentice Hall, 1995. (Useful for all parts of the course.)
- Hocking, L. M., Optimal Control: An introduction to the theory and applications, Oxford 1991. (Chapters 4-7 are good for Part III of the course.)
- Ross, S., Introduction to Stochastic Dynamic Programming. Academic Press, 1983. (Chapters I-V are useful for Part I of the course.)
- Whittle, P., Optimization Over Time. Volumes I and II. Wiley, 1982-83. (Chapters 5, 17 and 18 of volume I are useful for Part II of the course. Chapter 7, volume II is good for Part III of the course.)
Schedules
Dynamic programming
The principle of optimality. The dynamic programming equation for finite-horizon problems. Interchange arguments. Markov decision processes in discrete time. Infinite-horizon problems: positive, negative and discounted cases. Value interation. Policy improvment algorithm. Stopping problems. Average-cost programming. [6]
LQG systems
Linear dynamics, quadratic costs, Gaussian noise. The Riccati recursion. Controllability. Stabilizability. Infinite-horizon LQ regulation. Observability. Imperfect state observation and the Kalman filter. Certainty equivalence control. [5]
Continuous-time models
The optimality equation in continuous time. Pontryagin’s maximum principle. Heuristic proof and connection with Lagrangian methods. Transversality conditions. Optimality equations for Markov jump processes and diffusion processes. [5]
Contents
1 Dynamic Programming
1.1 Control as optimization over time
1.2 The principle of optimality
1.3 Example: the shortest path problem
1.4 The optimality equation
1.5 Markov decision processes
2 Examples of Dynamic Programming
2.1 Example: managing spending and savings
2.2 Example: exercising a stock option
2.3 Example: secretary problem
3 Dynamic Programming over the Infinite Horizon
3.1 Discounted costs
3.2 Example: job scheduling
3.3 The infinite-horizon case
3.4 The optimality equation in the infinite-horizon case1
3.5 Example: selling an asset
4 Positive Programming
4.1 Example: possible lack of an optimal policy
4.2 Characterization of the optimal policy
4.3 Example: optimal gambling
4.4 Value iteration
4.5 Example: pharmaceutical trials5
5 Negative Programming
5.1 Stationary policies
5.2 Characterization of the optimal policy
5.3 Optimal stopping over a finite horizon8
5.4 Example: optimal parking
5.5 Optimal stopping over the infinite horizon
6 Bandit Processes and the Gittins Index
6.1 Multi-armed bandit problem
6.2 Gittins index theorem
6.3 Calibration
6.4 Proof of the Gittins index theorem
6.5 Calculation of the Gittins index
6.6 Forward induction policies
7 Average-cost Programming
7.1 Average-cost optimality equation
7.2 Example: admission control at a queue
7.3 Value iteration bounds
7.4 Policy improvement algorithm
8 LQ Regulation
8.1 The LQ regulation problem
8.2 The Riccati recursion
8.3 White noise disturbances
8.4 LQ regulation in continuous-time
9 Controllability
9.1 Controllability
9.2 Controllability in continuous-time
9.3 Linearization of nonlinear models
9.4 Example: broom balancing
9.5 Example: satellite in a plane orbit
10 Stabilizability and Observability
10.1 Stabilizability
10.2 Example: pendulum
10.3 Infinite-horizon LQ regulation
10.4 The [A, B, C] system
10.5 Observability in continuous-time
11 Kalman Filter and Certainty Equivalence
11.1 Example: observation of population
11.2 Example: satellite in planar orbit
11.3 Imperfect state observation with noise
11.4 The Kalman filter
11.5 Certainty equivalence
12 Dynamic Programming in Continuous Time
12.1 Example: parking a rocket car
12.2 The Hamilton-Jacobi-Bellman equation
12.3 Example: LQ regulation
12.4 Example: harvesting fish
13 Pontryagin’s Maximum Principle
13.1 Heuristic derivation
13.2 Example: parking a rocket car (continued)
13.3 Transversality conditions
13.4 Adjoint variables as Lagrange multipliers
14 Applications of the Maximum Principle
14.1 Example: a moon lander
14.2 Example: use of transversality conditions
14.3 Problems in which time appears
14.4 Example: monopolist
14.5 Example: insects as optimizers
15 Controlled Markov Jump Processes
15.1 The dynamic programming equation
15.2 The case of a discrete state space
15.3 Uniformization in the infinite horizon case
15.4 Example: admission control at a queue
16 Controlled Diffusion Processes
16.1 Diffusion processes and controlled diffusion processes
16.2 Example: noisy LQ regulation in continuous time
16.3 Example: a noisy second order system
16.4 Example: passage to a stopping set
Other resources and further reading
Here is what the Occupational Outlook Handbook says about careers in operations research.
Here is the home page of INFORMS, the Institute for Operations Research and the Management Sciences.
Here is the home page of the UK Operational Research Society.