On an index policy for restless bandits

R.R. Weber and G. Weiss. J. Appl. Prob. 27, 637-648, 1990.


We investigate the optimal allocation of effort to a collection of n
projects.  The projects are 'restless' in that the state of a project
evolves in time, whether or not it is allocated effort.  The evolution
of the state of each project follows a Markov rule, but transitions
and rewards depend on whether or not the project receives effort.  The
objective is to maximize the expected time-average reward under a
constraint that exactly m of the n projects receive effort at any one
time.  We show that as m and n tend to infinity with m/n fixed, the
per-project reward of the optimal policy is asymptotically the same as
that achieved by a policy which operates under the relaxed constraint
that an average of m projects be active.  The relaxed constraint was
considered by Whittle (1988) who described how to use a Lagrangian
multiplier approach to assign indices to the projects.  he conjectured
that the policy of allocating effort to the m projects of greatest
index is asymptotically optimal as m and n tend to infinity.  We show
that this conjecture is true if the differential equation describing
the fluid approximation to the index policy has a globally stable
equilibrium point.  This need not be the case, and we present an
example for which the index policy is not asymptotically optimal.
However, numerical work suggest that such counterexamples are
extremely rare and that the size of the suboptimality which one
might expect is minuscule.

back to list of papers