Optimal robot scheduling for web search engines

E.G. Coffman, Z. Lui, and R.R. Weber, Journal of Scheduling 1, 15-29, 1998

A robot is deployed by a Web search engine in order to maintain the currency of its data base of Web pages. This paper studies robit shceduling policies that minimize the fraction $r_i$ of time pages spend out-of-date, assuming independent Poisson page-change processes and a general description for the page access time under any scheduling poilicy, and that, in order to minimize expected total obsolescence time of any page, the accesses to that page should be as evenly spread as possible.

We then investigate the problem of scheduling to minimize the cost function $\sum c_i r_i$, where the $c_i$ are given weights proportional to the page-change rates $\mu_i$. We give a tight bound on the performance of such a policy and prove that the optimal frequency at which the robot should access page $i$ is proportional to $\ln(h_i)^{-1}$, where $h_i:=Ee^{-\mu_iX}$. Note that this reduces to being proportional to $\mu_i$ when $X$ is a constant, but not, as one might expect, when $X$ has a general distribution.

Next, we evaluate randomized accessing policies, whereby the choices of page access are determined by independent random samples from the distribution {$f_i$}. We show that when the weights $c_i$ in the cost function are proportional to $\mu_i$, the minimum cost is achieved when $f_i$ is proportional to $h_i^{-1}$. Finally, we present and analyze a heuristic policy that is especially suited to the asymptotic regime of lare data bases.

back to list of papers

papers