Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-12T10:57:08.907Z Has data issue: false hasContentIssue false

Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence

Published online by Cambridge University Press:  14 July 2016

Rolando Cavazos-Cadena*
Affiliation:
Universidad Autónoma Agraria Antonio Narro
Emmanuel Fernández-Gaucherand*
Affiliation:
The University of Arizona
*
Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo COAH 25315, México.
∗∗Postal address: Systems and Industrial Engineering Department, The University of Arizona, Tucson AZ 85721, USA.

Abstract

This work concerns controlled Markov chains with denumerable state space, (possibly) unbounded cost function, and an expected average cost criterion. Under a Lyapunov function condition, together with mild continuity-compactness assumptions, a simple necessary and sufficient criterion is given so that the relative value functions and differential costs produced by the value iteration scheme converge pointwise to the solution of the optimality equation; this criterion is applied to obtain convergence results when the cost function is bounded below or bounded above.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1996 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under Grant NSF-INT 9201430, and by the Consejo Nacional de Ciencia y Tecnología (CONACyT), México.

∗∗

The work of Cavazos-Cadena was partially supported by the PSF Organization under Grant No. 200–150/94–01, and by the MAXTOR Foundation for Applied Probability and Statistics (MAXFAPS) under Grant No. 01–01–56/01–94.

References

[1] Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Gosh, M. K. and Marcus, S. I. (1993) Discrete-time controlled Markov processes with an average cost criterion: a survey. SIAM J Control Optim. 31, 282344.CrossRefGoogle Scholar
[2] Borkar, V. S. (1991) Topics in Controlled Markov Chains (Pitman Research Notes in Mathematics Series 240). Longman, London.Google Scholar
[3] Cavazos-Cadena, R. (1991) Recent results on conditions for the existence of average optimal stationary policies. Ann. Operat. Res. 28, 328.CrossRefGoogle Scholar
[4] Cavazos-Cadena, R. (1994) Cesàro convergence of the undiscounted value iteration method in Markov decision processes under the Lyapunov stability condition. Boletín de la Sociedad Mathemática Mexicana. To appear.Google Scholar
[5] Cavazos-Cadena, R. (1995) Undiscounted value iteration in stable Markov decision processes with bounded rewards. J. Math. Systems, Estimation and Control. To appear.Google Scholar
[6] Cavazos-Cadena, R. and Fernández-Gaucherand, ?. (1995) Denumerable controlled Markov chains with average reward criterion: sample path optimality. ZOR-Math. Meth. Operat. Res. 41, 89108.CrossRefGoogle Scholar
[7] Cavazos-Cadena, R. and Hernández-Lerma, O. (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl. Math. Optim. 26, 113137.CrossRefGoogle Scholar
[8] Dugundji, J. (1966) Topology. Allyn and Bacon, Boston.Google Scholar
[9] Federgruen, A. and Schweitzer, P. J. (1978) Discounted and undiscounted value iteration in Markov decision problems: A survey. In Dynamic Programming and its Applications. ed. Puterman, M. L. Academic Press, New York. pp. 2352.CrossRefGoogle Scholar
[10] Federgruen, A. and Schweitzer, P. J. (1980) A survey of asymptotic value-iteration for undiscounted Markovian decision processes. In Recent Developments in Markov Decision Processes. ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York. pp. 73109.Google Scholar
[11] Hernández-Lerma, O. (1989) Adaptive Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
[12] Hernández-Lerma, O. and Lasserre, J. B. (1993) Value iteration and rolling plans for Markov control processes with unbounded rewards. J. Math. Anal. Appl. 177, 3855.CrossRefGoogle Scholar
[13] Hinderer, K. (1970) Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer, New York.CrossRefGoogle Scholar
[14] Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam.Google Scholar
[15] Hordijk, A, Schweitzer, P. J. and Tijms, H. C. (1975) The asymptotic behavior of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Prob. 12, 298305.CrossRefGoogle Scholar
[16] Montes-De-Oca, R. and Hernández-Lerma, O. (1993) Value iteration in average cost Markov control processes on Borel spaces. Submitted. Google Scholar
[17] Puterman, M. (1994) Markov Decision Processes. Wiley, New York.CrossRefGoogle Scholar
[18] Royden, H. L. (1968) Real Analysis. 2nd edn. Macmillan, New York.Google Scholar
[19] Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
[20] Schweitzer, P. J. (1971) Iterative solution of the functional equations for undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495501.CrossRefGoogle Scholar
[21] Sennott, L. I. (1991) Value iteration in countable state average cost Markov decision processes with unbounded costs. Ann. Operat. Res. 28, 261272.CrossRefGoogle Scholar
[22] Thomas, L. C. (1965) Connectedness conditions for denumerable state Markov decision processes. In Recent Developments in Markov Decision Processes. ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York. pp. 181204.Google Scholar