# Optimal robot scheduling for web search
engines

### E.G. Coffman, Z. Lui, and R.R. Weber, *Journal
of Scheduling *1, 15-29, 1998

A robot is deployed by a Web search engine in order to
maintain the currency of its data base of Web pages. This paper
studies robit shceduling policies that minimize the fraction
$r_i$ of time pages spend out-of-date, assuming independent
Poisson page-change processes and a general description for the
page access time under any scheduling poilicy, and that, in order
to minimize expected total obsolescence time of any page, the
accesses to that page should be as evenly spread as possible.

We then investigate the problem of scheduling to minimize the
cost function $\sum c_i r_i$, where the $c_i$ are given weights
proportional to the page-change rates $\mu_i$. We give a tight
bound on the performance of such a policy and prove that the
optimal frequency at which the robot should access page $i$ is
proportional to $\ln(h_i)^{-1}$, where $h_i:=Ee^{-\mu_iX}$. Note
that this reduces to being proportional to $\mu_i$ when $X$ is a
constant, but not, as one might expect, when $X$ has a general
distribution.

Next, we evaluate randomized accessing policies, whereby the
choices of page access are determined by independent random
samples from the distribution {$f_i$}. We show that when the
weights $c_i$ in the cost function are proportional to $\mu_i$,
the minimum cost is achieved when $f_i$ is proportional to $h_i^{-1}$.
Finally, we present and analyze a heuristic policy that is
especially suited to the asymptotic regime of lare data bases.

**back to list of papers**

papers