Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-24T12:06:27.200Z Has data issue: false hasContentIssue false

Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

Published online by Cambridge University Press:  01 July 2016

Etienne Roquain*
Affiliation:
Institut National de la Recherche Agronomique
Sophie Schbath*
Affiliation:
Institut National de la Recherche Agronomique
*
Postal address: INRA, Unité Mathématique, Informatique et Génome, Domaine de Vilvert, F-78352 Jouy-en-Josas, France.
Postal address: INRA, Unité Mathématique, Informatique et Génome, Domaine de Vilvert, F-78352 Jouy-en-Josas, France.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2007 

References

Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen–Stein method. Statist. Sci. 5, 403434.Google Scholar
Chryssaphinou, O. and Papastavridis, S. (1990). The occurrence of sequence patterns in repeated dependent experiments. Theory Prob. Appl. 35, 145152.CrossRefGoogle Scholar
Chryssaphinou, O., Papastavridis, S. and Vaggelatou, E. (2001). Poisson approximation for the non-overlapping appearances of several words in Markov chains. Combin. Prob. Comput. 10, 293308.Google Scholar
Godbole, A. P. (1991). Poisson approximations for runs and patterns of rare events. Adv. Appl. Prob. 23, 851865.Google Scholar
Lothaire, M. (2005). Applied Combinatorics on Words. Cambridge University Press.Google Scholar
Prum, B., Rodolphe, F. and de Turckheim, É. (1995). Finding words with unexpected frequencies in DNA sequences. J. R. Statist. Soc. B 57, 205220.Google Scholar
Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259280.Google Scholar
Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223253.Google Scholar
Reinert, G., Schbath, S. and Waterman, M. (2000). Probabilistic and statistical properties of words. J. Comput. Biol. 7, 146.Google Scholar
Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179193.Google Scholar
Robin, S. and Schbath, S. (2001). Numerical comparison of several approximations of the word count distribution in random sequences. J. Comput. Biol. 8, 349359.Google Scholar
Schbath, S. (1995). Compound Poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 116.Google Scholar
Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaıcirc;ne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. , Université René Descartes, Paris V.Google Scholar