Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-10T22:02:30.926Z Has data issue: false hasContentIssue false

Perturbation analysis of Markov chain Monte Carlo for graphical models

Published online by Cambridge University Press:  06 January 2025

Na Lin*
Affiliation:
Central South University
Yuanyuan Liu*
Affiliation:
Central South University
Aaron Smith*
Affiliation:
University of Ottawa
*
*Postal address: School of Mathematics and Statistics, HNP-LAMA, Central South University, Changsha, China.
*Postal address: School of Mathematics and Statistics, HNP-LAMA, Central South University, Changsha, China.
***Postal address: Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.

Abstract

The basic question in perturbation analysis of Markov chains is: how do small changes in the transition kernels of Markov chains translate to chains in their stationary distributions? Many papers on the subject have shown, roughly, that the change in stationary distribution is small as long as the change in the kernel is much less than some measure of the convergence rate. This result is essentially sharp for generic Markov chains. We show that much larger errors, up to size roughly the square root of the convergence rate, are permissible for many target distributions associated with graphical models. The main motivation for this work comes from computational statistics, where there is often a tradeoff between the per-step error and per-step cost of approximate MCMC algorithms. Our results show that larger perturbations (and thus less-expensive chains) still give results with small error.

Type
Original Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adell, J. A. and Jodra, P. (2006). Exact Kolmogorov and total variation distances between some familiar discrete distributions. J. Inequal. Appl. 2006, 18.CrossRefGoogle Scholar
Alquier, P., Friel, N., Everitt, R. and Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statist. Comput. 26, 2947.CrossRefGoogle Scholar
Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37, 697725.CrossRefGoogle Scholar
Baker, J., Fearnhead, P., Fox, E. B. and Nemeth, C. (2019). Control variates for stochastic gradient MCMC. Statist. Comput. 29, 599615.CrossRefGoogle Scholar
Bardenet, R., Doucet, A. and Holmes, C. (2017). On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18, 15151557.Google Scholar
Breyer, L., Roberts, G. O. and Rosenthal, J. S. (2001). A note on geometric ergodicity and floating-point roundoff error. Statist. Prob. Lett. 53, 123127.CrossRefGoogle Scholar
Bubley, R., Dyer, M., Greenhill, C. and Jerrum, M. (1999). On approximately counting colorings of small degree graphs. SIAM J. Comput. 29, 387400.CrossRefGoogle Scholar
Daskalakis, C. and Pan, Q. (2017). Square Hellinger subadditivity for Bayesian networks and its applications to identity testing. In Proc. Mach. Learn. Res. 65, 697–703.Google Scholar
Daskalakis, C. and Pan, Q. (2021). Sample-optimal and efficient learning of tree Ising models. In Proc. 53rd Ann. ACM SIGACT Symp. Theory of Computing, pp. 133146.CrossRefGoogle Scholar
Ding, J., Song, J. and Sun, R. (2023). A new correlation inequality for Ising models with external fields. Prob. Theory Relat. Fields 186, 477492.CrossRefGoogle Scholar
Durmus, A. O. and Eberle, A. (2023). Asymptotic bias of inexact Markov chain Monte Carlo methods in high dimension. Preprint, arXiv:2108.00682.Google Scholar
Gamarnik, D. and Katz, D. (2012). Correlation decay and deterministic FPTAS for counting colorings of a graph. J. Discrete Alg. 12, 2947.CrossRefGoogle Scholar
Goldberg, L. A., Jalsenius, M., Martin, R. and Paterson, M. (2006). Improved mixing bounds for the anti-ferromagnetic Potts model on $\mathbb{Z}^{2}$ . LMS J. Comput. Math. 9, 120.CrossRefGoogle Scholar
Goldberg, L. A., Martin, R. and Paterson, M. (2005). Strong spatial mixing with fewer colors for lattice graphs. SIAM J. Comput. 35, 486517.CrossRefGoogle Scholar
Guyon, X. and Hardouin, C. (2002). Markov chain Markov field dynamics: Models and statistics. Statistics 36, 339363.CrossRefGoogle Scholar
Johndrow, J. and Mattingly, J. (2017). Error bounds for approximations of Markov chains used in Bayesian sampling. Preprint, arXiv:1711.05382.Google Scholar
Johndrow, J., Orenstein, P. and Bhattacharya, A. (2020). Scalable approximate MCMC algorithms for the horseshoe prior. J. Mach. Learn. Res. 21, 161.Google Scholar
Johndrow, J., Pillai, N. and Smith, A. (2020). No free lunch for approximate MCMC. Preprint, arXiv:2010.12514.Google Scholar
Kato, T. (1995). A Perturbation Theory for Linear Operators. Springer, Berlin.CrossRefGoogle Scholar
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.Google Scholar
Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis–Hastings budget. In Proc. 31st Int. Conf. Machine Learning, Vol. 32, pp. 181–189.Google Scholar
Kunsch, H. (1984). Time reversal and stationary Gibbs measures. Stoch. Process. Appl. 17, 159166.CrossRefGoogle Scholar
Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). Markov Chains and Mixing Times. American Mathematical Society, Providence, RI.Google Scholar
Li, S. Z. (2009). Markov Random Field Modeling in Image Analysis. Springer, New York.Google Scholar
Martinelli, F. (1999). Lectures on Glauber dynamics for discrete spin models. In Lectures on Probability Theory and Statistics (Lect. Notes Math. 1717), eds J. Bertoin, F. Martinelli and Y. Peres. Springer, New York, pp. 93–191.CrossRefGoogle Scholar
Martinelli, F. and Olivieri, E. (1994). Approach to equilibrium of Glauber dynamics in the one phase region: I. The attractive case. Commun. Math. Phys. 161, 447486.CrossRefGoogle Scholar
Martinelli, F., Olivieri, E. and Schonmann, R. H. (1994). For 2-D lattice spin systems weak mixing implies strong mixing. Commun. Math. Phys. 165, 3347.CrossRefGoogle Scholar
Medina-Aguayo, F. J., Lee, A. and Roberts, G. O. (2016). Stability of noisy Metropolis–Hastings. Statist. Comput. 26, 11871211.CrossRefGoogle ScholarPubMed
Medina-Aguayo, F., Rudolf, D. and Schweizer, N. (2020). Perturbation bounds for Monte Carlo within Metropolis via restricted approximations. Stoch. Process. Appl. 130, 22002227.CrossRefGoogle ScholarPubMed
Mitrophanov, A. Yu. (2005). Sensitivity and convergence of uniformly ergodic Markov chains. J. Appl. Prob. 42, 10031014.CrossRefGoogle Scholar
Nagapetyan, T., Duncan, A. B., Hasenclever, L., Vollmer, S. J., Szpruch, L. and Zygalakis, K. (2017). The true cost of stochastic gradient Langevin dynamics. Preprint, arXiv:1706.02692.Google Scholar
Negrea, J. and Rosenthal, J. S. (2021). Approximations of geometrically ergodic reversible Markov chains. Adv. Appl. Prob. 53, 9811022.CrossRefGoogle Scholar
Parisi, G. (1981). Correlation functions and computer simulations. Nucl. Phys. B 180, 378384.CrossRefGoogle Scholar
Quiroz, M., Kohn, R. Villani, M. and Tran, M.-N. (2019). Speeding up MCMC by efficient data subsampling. J. Amer. Statist. Assoc. 114, 831–843.CrossRefGoogle Scholar
Quiroz, M., Kohn, R., Villani, M., Tran, M.-N. and Dang, K.-D. (2018). Subsampling MCMC – An introduction for the survey statistician. Sankhya A 80, 3369.CrossRefGoogle Scholar
Rastelli, R., Maire, F. and Friel, N. (2018). Computationally efficient inference for latent position network models. Preprint, arXiv:1804.02274.Google Scholar
Roberts, G. and Rosenthal, J. (1997). Geometric ergodicity and hybrid Markov chains. Electron. Commun. Prob. 2, 1325.CrossRefGoogle Scholar
Rudolf, D. and Schweizer, N. (2018). Perturbation theory for Markov chains via Wasserstein distance. Bernoulli 24, 26102639.CrossRefGoogle Scholar
Rudolf, D., Smith, A. and Quiroz, M. (2024). Perturbations of Markov chains. Preprint, arXiv:2404.10251.Google Scholar
Salas, J. and Sokal, A. D. (1997). Absence of phase transition for antiferromagnetic Potts models via the Dobrushin uniqueness theorem. J. Statist. Phys. 86, 551579.CrossRefGoogle Scholar
Weitz, D. (2006). Counting independent sets up to the tree threshold. In Proc. 38th Ann. ACM Symp. Theory of Computing, pp. 140149.CrossRefGoogle Scholar
Welling, M. and Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proc. 28th Int. Conf. Machine Learning, ICML-11, pp. 681–688.Google Scholar