Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-26T12:53:15.375Z Has data issue: false hasContentIssue false

Evaluating policies for generalized bandits via a notion of duality

Published online by Cambridge University Press:  14 July 2016

J. H. Crosbie*
Affiliation:
University of Newcastle
K. D. Glazebrook*
Affiliation:
University of Newcastle
*
Postal address: Department of Statistics, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK
Postal address: Department of Statistics, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK

Abstract

Nash's generalization of Gittins’ classic index result to so-called generalized bandit problems (GBPs) in which returns are dependent on the states of all arms (not only the one which is pulled) has proved important for applications. The index theory for special cases of this model in which all indices are positive is straightforward. However, this is not a natural restriction in practice. An earlier proposal for the general case did not yield satisfactory index-based suboptimality bounds for policies — a central feature of classical Gittins index theory. We develop such bounds via a notion of duality for GBPs which is of independent interest. The index which emerges naturally from this analysis is the reciprocal of the one proposed by Nash.

Type
Research Papers
Copyright
Copyright © by the Applied Probability Trust 2000 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Fay, N. A., and Glazebrook, K. D. (1987). On the scheduling of alternative stochastic jobs on a single machine. Adv. Appl. Prob. 19, 955973.Google Scholar
Fay, N. A., and Glazebrook, K. D. (1989). A general model of the scheduling of alternative tasks on a single machine. Proc. Eng. Inform. Sci. 3, 199221.Google Scholar
Fay, N. A., and Walrand, J. C. (1991). On approximately optimal index strategies for generalised arm problems. J. Appl. Prob. 28, 602612.CrossRefGoogle Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B, 41, 148177.Google Scholar
Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar
Gittins, J. C., and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, eds Gani, J. and Vince, I. North Holland, Amsterdam, pp. 241266.Google Scholar
Glazebrook, K. D. (1982). On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Prob. 19, 716722.Google Scholar
Glazebrook, K. D., and Fay, N. A. (1988). Evaluating strategies for generalised bandit problems. Int. J. Syst. Sci. 19, 16051613.CrossRefGoogle Scholar
Glazebrook, K. D., and Greatrix, S. (1993). On scheduling influential stochastic tasks on a single machine. Eur. J. Operat. Res. 70, 405424.Google Scholar
Glazebrook, K. D., and Greatrix, S. (1995). On transforming an index for generalised bandit problems. J. Appl. Prob. 32, 163182.Google Scholar
Glazebrook, K. D., and Owen, R. W. (1991). New results for generalised bandit problems. Int. J. Syst. Sci. 22, 479494.Google Scholar
Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. thesis, University of Cambridge.Google Scholar
Nash, P. (1980). A generalised bandit problem. J. R. Statist. Soc. B, 42, 165169.Google Scholar
Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, London.Google Scholar