Hostname: page-component-7dd5485656-k8lnt Total loading time: 0 Render date: 2025-10-24T11:36:42.567Z Has data issue: false hasContentIssue false

Toll-based reinforcement learning for efficient equilibria in route choice

Published online by Cambridge University Press:  05 March 2020

Gabriel de O. Ramos
Affiliation:
Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil, e-mail: gdoramos@unisinos.br Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: roxana@ai.vub.ac.be, ann.nowe@ai.vub.ac.be
Bruno C. Da Silva
Affiliation:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: bsilva@inf.ufrgs.br, bazzan@inf.ufrgs.br
Roxana Rădulescu
Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: roxana@ai.vub.ac.be, ann.nowe@ai.vub.ac.be
Ana L. C. Bazzan
Affiliation:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: bsilva@inf.ufrgs.br, bazzan@inf.ufrgs.br
Ann Nowé
Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: roxana@ai.vub.ac.be, ann.nowe@ai.vub.ac.be

Abstract

The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

Information

Type
Adaptive and Learning Agents
Copyright
© Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abdallah, S. & Lesser, V. 2006. Learning the task allocation game, In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’06, ACM Press, 850–857.Google Scholar
Agogino, A. K. & Tumer, K. 2004. Unifying temporal and structural credit assignment problems, In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’04, IEEE, 980–987.Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y. & Schapire, R. E. 2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 4877.CrossRefGoogle Scholar
Awerbuch, B. & Kleinberg, R. D. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, ACM, 45–53.Google Scholar
Bar-Gera, H. 2010. Traffic assignment by paired alternative segments. Transportation Research Part B: Methodological 44(8–9), 10221046.CrossRefGoogle Scholar
Bazzan, A. L. C. 2009. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multiagent Systems 18(3), 342375.CrossRefGoogle Scholar
Bazzan, A. L. C. & Klügl, F. 2005. Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models. Transportation Research C 13(4), 299319.CrossRefGoogle Scholar
Bazzan, A. L. C. & Klügl, F. 2013. Introduction to Intelligent Systems inTraffic and Transportation, Vol. 7. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool.Google Scholar
Beckmann, M., McGuire, C. B. & Winsten, C. B. 1956. Studies in the Economics of Transportation, Yale University Press, New Haven.Google Scholar
Bonifaci, V., Salek, M. & Schäfer, G. 2011. Efficiency of restricted tolls in non-atomic network routing games. In Algorithmic Game Theory: Proceedings of the 4th International Symposium (SAGT 2011), Persiano, G. (ed). Springer, 302–313.Google Scholar
Bowling, M. 2005. Convergence and no-regret in multiagent learning. In Saul, L. K., Weiss, Y. & Bottou, L., (eds.) Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, MIT Press, 209216.Google Scholar
Boyan, J. A. & Littman, M. L. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6, 671678.Google Scholar
Braess, D. 1968. Über ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung 12, 258.Google Scholar
Buşoniu, L., Babuska, R. & De Schutter, B. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156172.CrossRefGoogle Scholar
Chen, H., An, B., Sharon, G., Hanna, J. P., Stone, P., Miao, C. & Soh, Y. C. 2018. DyETC: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), number February, AAAI Press, 757–765.Google Scholar
Chen, P.-A. & Kempe, D. 2008. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08), Riedl, J. & Sandholm, T. (eds.), ACM Press, 140–149.Google Scholar
Claus, C. & Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 746–752.Google Scholar
Colby, M., Duchow-Pressley, T., Chung, J. J. & Tumer, K. 2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), IFAAMAS, Singapore, 521–529.Google Scholar
Cole, R., Dodis, Y. & Roughgarden, T. 2003. Pricing network edges for heterogeneous selfish users. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, ACM, 521–530.Google Scholar
Crites, R. H. & Barto, A. G. 1998. Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2), 235262.CrossRefGoogle Scholar
de Palma, A. & Lindsey, R. 2011. Traffic congestion pricing methodologies and technologies. Transportation Research Part C: Emerging Technologies 19(6), 13771399.CrossRefGoogle Scholar
Fehr, E. & Fischbacher, U. 2003. The nature of human altruism. Nature 425(6960), 785791.CrossRefGoogle ScholarPubMed
Fleischer, L., Jain, K. & Mahdian, M. 2004. Tolls for heterogeneous selfish users in multicommodity networks and generalized congestion games. In 45th Annual IEEE Symposium on Foundations of Computer Science, IEEE, Rome, Italy, 277–285.Google Scholar
Foerster, J., Nardell, N., Farquhar, G., Afouras, T., Torr, P. H., Kohli, P. & Whiteson, S. 2017. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, 70, PMLR, 1146–1155.Google Scholar
Centre for Economics and Business Research 2014. The Future Economic and Environmental Costs of Gridlock in 2030, Technical report, Centre for Economics and Business Research, London.Google Scholar
Hernandez-Leal, P., Zhan, Y., Taylor, M. E., Sucar, L. E. & Munoz de Cote, E. 2017. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems 31(5), 9711002.Google Scholar
Hoefer, M. & Skopalik, A. 2009. Altruism in atomic congestion games. In 17th Annual European Symposium on Algorithms, Fiat, A. & Sanders, P. (eds.), Springer, Berlin Heidelberg, 179–189.Google Scholar
Hu, J. & Wellman, M. P. 1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, 242250.Google Scholar
Hu, J. & Wellman, M. P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 10391069.Google Scholar
Jayakrishnan, R., Cohen, M., Kim, J., Mahmassani, H. S. & Hu, T.-Y. 1993. A Simulation-Based Framework for the Analysis of Traffic Networks Operating with Real-Time Information, Technical Report UCB-ITS-PRR-93-25, University of California, Berkeley.Google Scholar
Kaisers, M. & Tuyls, K. 2010. Frequency adjusted multi-agent q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 309–316.Google Scholar
Kobayashi, K. & Do, M. 2005. The informational impacts of congestion tolls upon route traffic demands. Transportation Research A 39(7–9), 651670.Google Scholar
Koutsoupias, E. & Papadimitriou, C. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS), Springer-Verlag, 404413.Google Scholar
Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., Silver, D. & Graepel, T. 2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R. (eds.), 30, Curran Associates, Inc., 4190–4203.Google Scholar
Lauer, M. & Riedmiller, M. 2004. Reinforcement learning for stochastic cooperative multi-agent-systems. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 3, 1516–1517.Google Scholar
Laurent, G. J., Matignon, L. & Le Fort-Piat, N. 2011. The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems 15(1), 5564.CrossRefGoogle Scholar
LeBlanc, L. J., Morlok, E. K. & Pierskalla, W. P. 1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research 9(5), 309318.Google Scholar
Levy, N. & Ben-Elia, E. 2016. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Computer Science 83, 928933.CrossRefGoogle Scholar
Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, ML, Morgan Kaufmann, 157–163.Google Scholar
Littman, M. L. 2001. Friend-or-Foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML’01). Morgan Kaufmann, 322328.Google Scholar
Lujak, M., Giordani, S. & Ossowski, S. 2015. Route guidance: Bridging system and user optimization in traffic assignment. Neurocomputing 151, 449460.Google Scholar
Malialis, K., Devlin, S. & Kudenko, D. 2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 503–511.Google Scholar
Matignon, L., Laurent, G. J. & Le Fort-Piat, N. 2012. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27(1), 131.CrossRefGoogle Scholar
McFadden, D. 2001. Disaggregate behavioral travel demand’s RUM side. In Travel Behaviour Research: The Leading Edge, Hensher, D. A. (ed), Elsevier, 17–63.Google Scholar
Meir, R. & Parkes, D. 2018. Playing the wrong game: Bounding externalities in diverse populations of agents. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), IFAAMAS, Stockholm, 86–94.Google Scholar
Meir, R. & Parkes, D. C. 2016. When are marginal congestion tolls optimal? In Proceedings of the Ninth Workshop on Agents in Traffic and Transportation (ATT-2016), Bazzan, A. L. C., Klügl, F., Ossowski, S. & Vizzari, G. (eds). CEUR-WS.org, 8.Google Scholar
Mirzaei, H., Sharon, G., Boyles, S., Givargis, T. & Stone, P. 2018. Link-based parameterized micro-tolling scheme for optimal traffic management. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, Dastani, M., Sukthankar, G., André, E. & Koenig, S. (eds). IFAAMAS, 2013–2015.Google Scholar
Nash, J. 1950. Non-Cooperative Games, PhD thesis, Princeton University.Google Scholar
National Surface Transportation Infrastructure Financing Commission 2009. Paying Our Way: A New Framework for Transportation Finance, Technical report, National Surface Transportation Infrastructure Financing Commission, Washington DC.Google Scholar
Bureau of Public Roads 1964. Traffic Assignment Manual, Technical report, US Department of Commerce, Washington, D. C.Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J. P. & Vian, J. 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning, 70, 41084122.Google Scholar
Ortúzar, J. d. D. & Willumsen, L. G. 2011. Modelling Transport, 4 edition, John Wiley & Sons.CrossRefGoogle Scholar
Pigou, A. 1920. The Economics of Welfare, Palgrave Classics in Economics, Palgrave Macmillan.Google Scholar
Proper, S. & Tumer, K. 2012. Modeling difference rewards for multiagent learning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, V., Winikoff, M., Padgham, L. & van der Hoek, W. (eds). IFAAMAS.Google Scholar
Rădulescu, R., Vrancx, P. & Nowé, A. 2017. Analysing congestion problems in multi-agent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 1705–1707.Google Scholar
Ramos, G. de O. 2018. Regret Minimisation and System-Efficiency in Route Choice, PhD thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre.Google Scholar
Ramos, G. de O. & Bazzan, A. L. C. 2015. Towards the user equilibrium in traffic assignment using GRASP with path relinking. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, ACM, 473–480.Google Scholar
Ramos, G. de O. & Bazzan, A. L. C. 2016. Efficient local search in traffic assignment. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1493–1500.Google Scholar
Ramos, G. de O., da Silva, B. C. & Bazzan, A. L. C. 2017. Learning to minimise regret in route choice. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das, S., Durfee, E., Larson, K. & Winikoff, M. (eds). IFAAMAS, 846–855.Google Scholar
Ramos, G. de O., da Silva, B. C., Rădulescu, R. & Bazzan, A. L. C. 2018. Learning system-efficient equilibria in route choice using tolls. In Proceedings of the Adaptive Learning Agents Workshop 2018 (ALA-18), Stockholm.Google Scholar
Ramos, G. de O., Rădulescu, R., Nowé, A. & Tavares, A. R. 2020. Toll-based learning for minimising congestion under heterogeneous preferences. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), An, B., Yorke-Smith, N., El Fallah Seghrouchni, A. & Sukthankar, G. (eds). IFAAMAS.Google Scholar
Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58(5), 527535.Google Scholar
Rosenthal, R. W. 1973. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory 2, 6567.CrossRefGoogle Scholar
Roughgarden, T. 2005. Selfish Routing and the Price of Anarchy, MIT Press.Google Scholar
Sandholm, T. 2007. Perspectives on multiagent learning. Artificial Intelligence 171(7), 382391.CrossRefGoogle Scholar
Sen, S., Sekaran, M. & Hale, J. 1994. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 426–431.Google Scholar
Sharon, G., Boyles, S. D., Alkoby, S. & Stone, P. 2019. Marginal cost pricing with a fixed error factor in traffic networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Agmon, N., Taylor, M., Elkind, E. & Veloso, M. (eds). IFAAMAS, Montreal, 1539–1546.Google Scholar
Sharon, G., Hanna, J. P., Rambha, T., Levin, M. W., Albert, M., Boyles, S. D. & Stone, P. 2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das, S., Durfee, E., Larson, K. & Winikoff, M. (eds). IFAAMAS, 828–836.Google Scholar
Stefanello, F. & Bazzan, A. L. C. 2016. Traffic Assignment Problem – Extending Braess Paradox, Technical report, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS.Google Scholar
Sutton, R. & Barto, A. 1998. Reinforcement Learning: An Introduction, MIT Press.CrossRefGoogle Scholar
Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337.Google Scholar
Tavares, A. R. & Bazzan, A. L. 2014. An agent-based approach for road pricing: System-level performance and implications for drivers. Journal of the Brazilian Computer Society 20(1), 15.CrossRefGoogle Scholar
Tesauro, G. 1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2), 215219.CrossRefGoogle Scholar
Tuyls, K. & Weiss, G. 2012. Multiagent learning: Basics, challenges, and prospects. AI Magazine 33(3), 4152.CrossRefGoogle Scholar
van Essen, M., Thomas, T., van Berkum, E. & Chorus, C. 2016. From user equilibrium to system optimum: a literature review on the role of travel information, bounded rationality and non-selfish behaviour at the network and individual levels. Transport Reviews 36(4), 527548.CrossRefGoogle Scholar
Verbeeck, K., Nowé, A., Parent, J. & Tuyls, K. 2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14(3), 239269.CrossRefGoogle Scholar
Vrancx, P., Verbeeck, K. & Nowe, A. 2008. Decentralized learning in markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4), 976981.CrossRefGoogle ScholarPubMed
Vrancx, P., Verbeeck, K. & Nowé, A. 2010. Learning to take turns. In Proceedings of the AAMAS 2010 Workshop on Adaptive Learning Agents and Multi-Agent Systems (ALA 2010), 1–7.Google Scholar
Wardrop, J. G. 1952. Some theoretical aspects of road traffic research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 325362.Google Scholar
Watkins, C. J. C. H. & Dayan, P. 1992. Q-learning. Machine Learning 8(3), 279292.CrossRefGoogle Scholar
Wolpert, D. H. & Tumer, K. 1999. An introduction to Collective Intelligence, Technical report NASA-ARC-IC-99-63, NASA Ames Research Center. arXiv:cs/9908014 [cs.LG].Google Scholar
Wolpert, D. H. & Tumer, K. 2002. Collective intelligence, data routing and Braess’ paradox. Journal of Artificial Intelligence Research 16, 359387.Google Scholar
Yang, H., Meng, Q. & Lee, D.-H. 2004. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions. Transportation Research Part B: Methodological 38(6), 477493.CrossRefGoogle Scholar
Ye, H., Yang, H. & Tan, Z. 2015. Learning marginal-cost pricing via a trial-and-error procedure with day-to-day flow dynamics. Transportation Research Part B: Methodological 81, 794807.CrossRefGoogle Scholar
Yen, J. Y. 1971. Finding the k shortest loopless paths in a network. Management Science 17(11), 712716.CrossRefGoogle Scholar
Youn, H., Gastner, M. T. & Jeong, H. 2008. Price of anarchy in transportation networks: Efficiency and optimality control. Physical Review Letters 101(12), 128701.CrossRefGoogle Scholar
Zhang, J., Pourazarm, S., Cassandras, C. G. & Paschalidis, I. C. 2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.Google Scholar
Zinkevich, M. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928936.Google Scholar