# Monotone optimal policies for left-skip-free
Markov decision processes

### S. Stidham and R.R. Weber. In *Applied
Probability and Stochastic Processes*, Kluwer Academic
Publishers, J. George Shanthikumar and Ushio Sumita eds, **13**,
191-202, 1999.

In a previous paper (Stidham and Weber, 1989)
we considered a variety of models for optimal control of the
service rate in a queueing system, in which the objective is to
minimize the limiting average expected cost per unit time. By
standard techniques, we showed how to convert such a problem into
an equivalent problem in which the objective is to minimize the
expected total (undiscounted) cost until the first entrance into
state zero. Under weak assumptions on the one-stage (service plus
holding) costs and transition probabilities, we showed that an
optimal policy is monotonic, that is, a larger service rate is
used in larger states. In contrast to previous models in the
literature on control of queues, we assumed that the holding cost
was non-decreasing, but not necessarily convex, in the state. A
common assumption in all the models was that the state
transitions were *skip free to the left*: a one-step
transition from state $i$ to a state $j<i-1$ is impossible.
Many queueing models have this property, including all birth-death
models, as well as a variety of *M/GI/1*-type models,
including models with batch arrivals, phase-type service times,
and *LCFS-PR* queue discipline.

Julian Keilson introduced the concept of skip-free
transitions in a Markov chain in Keilson (1962). The significance
of left-skip-free transitions in the queueing control models of
\cite{Sti89} is that the problem of optimally moving the system
from state $i$ to state $0$ can be decomposed into two separate
problems: first, to move the system optimally from state $i$ to
state $i-1$, and then to move the system optimally from state $i-1$
to state $0$. This decomposition was exploited in two ways in
Stidham and Weber (1989) to prove monotonicity of an optimal
policy: (1) to facilitate simple coupling or pairwise-switch
arguments in the case of additive transitions; and (2) to
construct a proof based on downward induction on the state
variable in the case of non-additive transitions.

In the present paper we extend the analysis in
Stidham and Weber (1989) to a general class of Markov decision
processes with left-skip-free transitions.

**back to list of papers**

papers