Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-01-11T05:01:22.491Z Has data issue: false hasContentIssue false

On the occupancy problem for a regime-switching model

Published online by Cambridge University Press:  04 May 2020

Michael Grabchak*
Affiliation:
University of North Carolina Charlotte
Mark Kelbert*
Affiliation:
National Research University Higher School of Economics
Quentin Paris*
Affiliation:
National Research University Higher School of Economics
*
*Postal address: Department of Mathematics and Statistics, Charlotte, NC, USA. Email address: mgrabcha@uncc.edu
**Postal address: National Research University Higher School of Economics (HSE), Faculty of Economics, Department of Statistics and Data Analysis, Moscow, Russia. Email address: mkelbert@hse.ru
***Postal address: National Research University Higher School of Economics (HSE), Faculty of Computer Science, School of Data Analysis and Artificial Intelligence & HDI Lab, Moscow, Russia. Email address: qparis@hse.ru

Abstract

This article studies the expected occupancy probabilities on an alphabet. Unlike the standard situation, where observations are assumed to be independent and identically distributed, we assume that they follow a regime-switching Markov chain. For this model, we (1) give finite sample bounds on the expected occupancy probabilities, and (2) provide detailed asymptotics in the case where the underlying distribution is regularly varying. We find that in the regularly varying case the finite sample bounds are rate optimal and have, up to a constant, the same rate of decay as the asymptotic result.

Type
Research Papers
Copyright
© Applied Probability Trust 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ben-Hamou, A., Boucheron, S. and Gassiat, E. (2016) Pattern coding meets censoring: (Almost) adaptive coding on countable alphabets. arXiv:1608.08367.Google Scholar
Ben-Hamou, A., Boucheron, S. and Ohannessian, M. I. (2017) Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23, 249287.CrossRefGoogle Scholar
Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987) Regular Variation (Encyclopedia of Mathematics And Its Applications). Cambridge University Press.CrossRefGoogle Scholar
Bubeck, S., Ernst, D. and Garivier, A. (2013) Optimal discovery with probabilistic expert advice: Finite time analysis and macroscopic optimality. J. Mach. Learn. Res. 14, 601623.Google Scholar
Chao, A. (1981) On estimating the probability of discovering a new species. Ann. Statist. 9, 13391342.10.1214/aos/1176345651CrossRefGoogle Scholar
Chen, S. F. and Goodman, J. (1999) An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 359394.10.1006/csla.1999.0128CrossRefGoogle Scholar
Decrouez, G., Grabchak, M. and Paris, Q. (2018) Finite sample properties of the mean occupancy counts and probabilities. Bernoulli 24, 19101941.10.3150/16-BEJ915CrossRefGoogle Scholar
Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63, 435447.Google Scholar
Gandolfi, A. and Sastri, C. C. A. (2004) Nonparametric estimations about species not observed in a random sample. Milan J. Math. 72, 81105.10.1007/s00032-004-0031-8CrossRefGoogle Scholar
Glynn, P. W. and Ormoneit, D. (2002) Hoeffding’s inequality for uniformly ergodic Markov chains. Statist. Prob. Lett. 56, 143146.CrossRefGoogle Scholar
Gnedin, A., Hansen, B. and Pitman, J. (2007) Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Prob. Surv. 4, 146171.10.1214/07-PS092CrossRefGoogle Scholar
Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40, 237264.10.1093/biomet/40.3-4.237CrossRefGoogle Scholar
Good, I. J. and Toulmin, G. H. (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43, 4563.10.1093/biomet/43.1-2.45CrossRefGoogle Scholar
Grabchak, M. and Zhang, Z. (2017) Asymptotic properties of Turing’s formula in relative error. Mach. Learn. 106, 17711785.10.1007/s10994-016-5620-6CrossRefGoogle Scholar
Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application. Wiley, New York.Google Scholar
Karlin, S. (1967) Central limit theorems for certain infinite urn schemes. J. Math. Mech. 17, 373401.Google Scholar
Mao, C. X. and Lindsay, B. G. (2002) A Poisson model for the coverage problem with a genomic application. Biometrika 89, 669681.10.1093/biomet/89.3.669CrossRefGoogle Scholar
Ohannessian, M. I. and Dahleh, M. A. (2012) Rare probability estimation under regularly varying heavy tails. In Proc. 25th Ann. Conf. on Learning Theory, Vol. 23, pp. 21.121.24.Google Scholar
Orlitsky, A., Santhanam, N. P. and Zhang, J. (2004) Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inf. Theory 50, 14691481.10.1109/TIT.2004.830761CrossRefGoogle Scholar
Paulin, D. (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Prob. 20, 132.10.1214/EJP.v20-4039CrossRefGoogle Scholar
Resnick, S. I. (2007) Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York.Google Scholar
Roberts, G. O. and Rosenthal, J. S. (2004) General state space Markov chains and MCMC algorithms. Prob. Surv. 1, 2071.10.1214/154957804100000024CrossRefGoogle Scholar
Thisted, R. and Efron, B. (1987) Did Shakespeare write a newly discovered poem? Biometrika 74, 445455.10.1093/biomet/74.3.445CrossRefGoogle Scholar
Zhang, C. H. (2005) Estimation of sums of random variables: Examples and information bounds. Ann. Statist. 33, 20222041.10.1214/009053605000000390CrossRefGoogle Scholar
Zhang, Z. and Huang, H. (2007) Turing’s formula revisited. J. Quant. Ling. 14, 222241.10.1080/09296170701514189CrossRefGoogle Scholar