Sequential Decisions | Qingyuan Zhao

A graphical approach to state variable selection in off-policy learning

We give graphical criteria for state variables to be 'valid' in off-policy learning in a framework that generalizes dynamic treatment regimes (DTRs) and Markov decision processes (MDPs).