Deep reinforcement learning for tracking a moving target in jellyfish-like swimming

Yihao Chen; Yue Yang

doi:10.1017/jfm.2025.10470

Deep reinforcement learning for tracking a moving target in jellyfish-like swimming

Published online by Cambridge University Press: 15 August 2025

Yihao Chen and

Yue Yang

Show author details

Yihao Chen: Affiliation:
State Key Laboratory for Turbulence and Complex Systems, School of Mechanics and Engineering Science, Peking University, Beijing 100871, PR China
Yue Yang*: Affiliation:
HEDPS-CAPT, Peking University, Beijing 100871, PR China
*: Corresponding author: Yue Yang, yyg@pku.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A deep reinforcement learning method for training a jellyfish-like swimmer to effectively track a moving target in a two-dimensional flow was developed. This swimmer is a flexible object equipped with a muscle model based on torsional springs. We employed a deep Q-network (DQN) that takes the swimmer’s geometry and dynamic parameters as inputs, and outputs actions that are the forces applied to the swimmer. In particular, an action regulation was introduced to mitigate the interference from complex fluid–structure interactions. The goal of these actions is to navigate the swimmer to a target point in the shortest possible time. In the DQN training, the data on the swimmer’s motions were obtained from simulations using the immersed boundary method. During tracking a moving target, there is an inherent delay between the application of forces and the corresponding response of the swimmer’s body due to hydrodynamic interactions between the shedding vortices and the swimmer’s own locomotion. Our tests demonstrate that the swimmer, with the DQN agent and action regulation, is able to dynamically adjust its course based on its instantaneous state. This work extends the application scope of machine learning in controlling flexible objects within fluid environments.

JFM classification

Biological Fluid Dynamics: Swimming/flying Mathematical Foundations: Machine learning Vortex Flows: Vortex dynamics

Information

Type: JFM Papers
Information: Journal of Fluid Mechanics , Volume 1017 , 25 August 2025 , A18

DOI: https://doi.org/10.1017/jfm.2025.10470 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A.C. & Bellemare, M. 2021 Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual (ed. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang & J. Wortman Vaughan), vol. 34, pp. 29304–29320. Curran Associates, Inc. Google Scholar

Bae, H.J. & Koumoutsakos, P. 2022 Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun. 13, 1443.10.1038/s41467-022-28957-7CrossRef Google Scholar PubMed

Battista, N.A., Strickland, W.C. & Miller, L.A. 2017 IB2d : a Python and MATLAB implementation of the immersed boundary method. Bioinspir. Biomim. 12 (3), 036003.CrossRef Google Scholar PubMed

Becker-Ehmck, P., Karl, M., Peters, J. & van der Smagt, P. 2020 Learning to fly via deep model-based reinforcement learning. arXiv:2003.08876.Google Scholar

Bishop, C.M. 2006 Pattern Recognition and Machine Learning. Springer.Google Scholar

Brunton, S.L., Noack, B.R. & Koumoutsakos, P. 2020 Machine learning for fluid mechanics. Annu. Rev. Fluid Mech. 52, 477–508.10.1146/annurev-fluid-010719-060214CrossRef Google Scholar

Brunton, S.L., Rowley, C.W. & Williams, D.R. 2013 Reduced-order unsteady aerodynamic models at low Reynolds numbers. J. Fluid Mech. 724, 203–233.CrossRef Google Scholar

Chatzimanolakis, M., Weber, P. & Koumoutsakos, P. 2024 Drag reduction in flows past 2D and 3D circular cylinders through deep reinforcement learning. Phys. Rev. Fluids 9 (4), 043902.10.1103/PhysRevFluids.9.043902CrossRef Google Scholar

Chen, G., Lu, Y., Yang, X. & Hu, H. 2022 Reinforcement learning control for the swimming motions of a beaver-like, single-legged robot based on biological inspiration. Robot. Auton. Syst. 154, 104116.CrossRef Google Scholar

Chen, N., Zhang, R., Liu, Q. & Ding, Z. 2024 Deep reinforcement learning-based active control for drag reduction of three equilateral-triangular circular cylinders. Eur. J. Mech. B/Fluids 104, 114–122.10.1016/j.euromechflu.2023.12.001CrossRef Google Scholar

Costello, J.H., Colin, S.P., Dabiri, J.O., Gemmell, B.J., Lucas, K.N. & Sutherland, K.R. 2021 The hydrodynamics of jellyfish swimming. Annu. Rev. Mar. Sci. 13, 375–396.CrossRef Google Scholar PubMed

d’Apolito, F. & Sulzbachner, C. 2021 Flight control of a multicopter using reinforcement learning. IFAC-PapersOnLine 54 (13), 251–255 10.1016/j.ifacol.2021.10.454CrossRef Google Scholar

DeVries, T. et al. 2019 Decadal trends in the ocean carbon sink. Proc. Natl Acad. Sci. USA 116 (24), 11646–11651.10.1073/pnas.1900371116CrossRef Google Scholar PubMed

Dular, M., Bajcar, T. & Širok, B. 2009 Numerical investigation of flow in the vicinity of a swimming jellyfish. Engng Applics. Comput. Fluid Mech. 3, 258–270.Google Scholar

Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S. & Karniadakis, G.E. 2020 Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. USA 117 (42), 26091–26098.10.1073/pnas.2004939117CrossRef Google Scholar PubMed

Frame, J., Lopez, N., Curet, O. & Engeberg, E.D. 2018 Thrust force characterization of free-swimming soft robotic jellyfish. Bioinspir. Biomim. 13 (6), 064001.10.1088/1748-3190/aadcb3CrossRef Google Scholar PubMed

Garnier, P., Viquerat, J., Rabault, J., Larcher, A., Kuhnle, A. & Hachem, E. 2021 A review on deep reinforcement learning for fluid mechanics. Comput. Fluids 225, 104973.10.1016/j.compfluid.2021.104973CrossRef Google Scholar

Gazzola, M., Tchieu, A.A., Alexeev, D., de Brauer, A. & Koumoutsakos, P. 2016 Learning to school in the presence of hydrodynamic interactions. J. Fluid Mech. 789, 726–749.10.1017/jfm.2015.686CrossRef Google Scholar

Gazzola, M., Van Rees, W.M. & Koumoutsakos, P. 2012 C-start: optimal start of larval fish. J. Fluid Mech. 698, 5–18.10.1017/jfm.2011.558CrossRef Google Scholar

Gemmell, B.J., Costello, J.H., Colin, S.P., Stewart, C.J., Dabiri, J.O., Tafti, D. & Priya, S. 2013 Passive energy recapture in jellyfish contributes to propulsive advantage over other metazoans. Proc. Natl Acad. Sci. USA 110 (44), 17904–17909.CrossRef Google Scholar PubMed

Griffith, B.E., Hornung, R.D., McQueen, D.M. & Peskin, C.S. 2007 An adaptive, formally second order accurate version of the immersed boundary method. J. Comput. Phys. 223 (1), 10–49.10.1016/j.jcp.2006.08.019CrossRef Google Scholar

Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G. & Levine, S. 2019 a Learning to walk via deep reinforcement learning. In Proceedings of Robotics: Science and Systems (ed. S. Hutchinson, A. Bicchi & H. Kress-Gazit). MIT Press.Google Scholar

Haarnoja, T. et al. 2019 b Soft actor–critic algorithms and applications. arXiv:1812.05905.Google Scholar

Han, B., Huang, W. & Xu, C. 2022 Deep reinforcement learning for active control of flow over a circular cylinder with rotational oscillations. Intl J. Heat Fluid Flow 96, 109008.CrossRef Google Scholar

Hochreiter, S. & Schmidhuber, J. 1997 Long short-term memory. Neural Comput. 9 (8), 1735–1780.10.1162/neco.1997.9.8.1735CrossRef Google Scholar PubMed

Hoover, A., Griffith, B. & Miller, L. 2017 Quantifying performance in the medusan mechanospace with an actively swimming three-dimensional jellyfish model. J. Fluid Mech. 813, 1112–1155.10.1017/jfm.2017.3CrossRef Google Scholar

Hoover, A. & Miller, L. 2015 A numerical study of the benefits of driving jellyfish bells at their natural frequency. J. Theor. Biol. 374, 13–25.10.1016/j.jtbi.2015.03.016CrossRef Google Scholar PubMed

Hoover, A., Xu, N., Gemmell, B., Colin, S., Costello, J., Dabiri, J. & Miller, L. 2021 Neuromechanical wave resonance in jellyfish swimming. Proc. Natl Acad. Sci. USA 118, e2020025118.10.1073/pnas.2020025118CrossRef Google Scholar PubMed

Hornik, K., Stinchcombe, M. & White, H. 1989 Multilayer feedforward networks are universal approximators. Neural Netw. 2 (5), 359–366.CrossRef Google Scholar

Huang, R., Wu, J., Jeng, J. & Chen, R. 2001 Surface flow and vortex shedding of an impulsively started wing. J. Fluid Mech. 441, 265–292.10.1017/S002211200100489XCrossRef Google Scholar

Itahashi, N., Itoh, H., Fukumoto, H. & Wakuya, H. 2024 Reinforcement learning of bipedal walking using a simple reference motion. Appl. Sci. 14 (5), 1803.10.3390/app14051803CrossRef Google Scholar

Joshi, K.B. 2012 Modeling of bio-inspired jellyfish vehicle for energy efficient propulsion. PhD thesis, Virginia Polytechnic Institute and State University.Google Scholar

Kang, L., Gao, A., Han, F., Cui, W. & Lu, X. 2023 Propulsive performance and vortex dynamics of jellyfish-like propulsion with burst-and-coast strategy. Phys. Fluids 35 (9), 091904.10.1063/5.0160878CrossRef Google Scholar

Karniadakis, G.E., Kevrekidis, Y., Lu, L., Perdikaris, P., Wang, S. & Yang, L. 2021 Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440.10.1038/s42254-021-00314-5CrossRef Google Scholar

Katsuki, T. & Greenspan, R.J. 2013 Jellyfish nervous systems. Curr. Biol. 23 (14), 592–594.10.1016/j.cub.2013.03.057CrossRef Google Scholar PubMed

Konda, V. & Tsitsiklis, J. 1999 Actor–critic algorithms. In Advances in Neural Information Processing Systems (ed. S. Solla, T. Leen & K. Müller), pp. 1008–1014. MIT Press.Google Scholar

Kubo, A. & Shimizu, M. 2022 Efficient reinforcement learning with partial observables for fluid flow control. Phys. Rev. E 105, 065101.10.1103/PhysRevE.105.065101CrossRef Google Scholar PubMed

Li, G., Ashraf, I., François, B., Kolomenskiy, D., Lechenault, F., Godoy-Diana, R. & Thiria, B. 2021 Burst-and-coast swimmers optimize gait by adapting unique intrinsic cycle. Commun. Biol. 4, 40.CrossRef Google Scholar PubMed

Li, J. & Zhang, M. 2022 Reinforcement-learning-based control of confined cylinder wakes with stability analyses. J. Fluid Mech. 932, A44.CrossRef Google Scholar

Liu, Q., An, B., Nohmi, M., Obuchi, M. & Taira, K. 2020 Active flow control of a pump-induced wall-normal vortex with steady blowing. J. Fluids Engng 142 (8), 081202.10.1115/1.4046692CrossRef Google Scholar

Loisy, A. & Eloy, C. 2022 Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A. 478 (2262), 20220118.Google Scholar

Matharu, P., Gong, P., Guntaka, K.P.R., Almubarak, Y., Jin, Y. & Tadesse, Y. 2023 Jelly-Z: swimming performance and analysis of twisted and coiled polymer (TCP) actuated jellyfish soft robot. Sci. Rep. 13, 11086.CrossRef Google Scholar PubMed

Miles, J.G. & Battista, N.A. 2019 a Don’t be jelly: exploring effective jellyfish locomotion. arXiv:1904.09340.Google Scholar

Miles, J.G. & Battista, N.A. 2019 b Naut your everyday jellyfish model: exploring how tentacles and oral arms impact locomotion. Fluids 4 (3), 169–212.10.3390/fluids4030169CrossRef Google Scholar

Mirzakhanloo, M., Esmaeilzadeh, S. & Alam, M. 2020 Active cloaking in Stokes flows via reinforcement learning. J. Fluid Mech. 903, A34.CrossRef Google Scholar

Mnih, V. et al. 2015 Human-level control through deep reinforcement learning. Nature 518 (7540), 529–533.CrossRef Google Scholar PubMed

Pallasdies, F., Goedeke, S., Braun, W. & Memmesheimer, R. 2019 From single neurons to behavior in the jellyfish Aurelia aurita . eLife 8, e50084.CrossRef Google Scholar PubMed

Peskin, C.S. 2002 The immersed boundary method. Acta Numer. 11, 479–517.10.1017/S0962492902000077CrossRef Google Scholar

Qiu, J., Huang, W., Xu, C. & Zhao, L. 2020 Swimming strategy of settling elongated micro-swimmers by reinforcement learning. Sci. China: Phys. Mech. Astron. 63, 284711.Google Scholar

Reaka-Kudla, M.L. 2001 Known and unknown biodiversity, risk of extinction and conservation strategy in the Sea. In Waters in Peril, pp. 19–33. Springer US.CrossRef Google Scholar

Reddy, G., Celani, A., Sejnowski, T.J. & Vergassola, M. 2016 Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113 (33), E4877–E4884.CrossRef Google Scholar PubMed

Renn, P.I. & Gharib, M. 2022 Machine learning for flow-informed aerodynamic control in turbulent wind conditions. Commun. Engng 1, 45.10.1038/s44172-022-00046-zCrossRef Google Scholar

Rodwell, C. & Tallapragada, P. 2023 Physics-informed reinforcement learning for motion control of a fish-like swimming robot. Sci. Rep. 13, 10754.CrossRef Google Scholar PubMed

Rosenblatt, F. 1962 Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books.Google Scholar

Rumelhart, D.E., Hinton, G.E. & Williams, R.J. 1986 Learning representations by back-propagating errors. Nature 323, 533–536.10.1038/323533a0CrossRef Google Scholar

Sears, W.R. 1938 A systematic presentation of the theory of thin airfoils in non-uniform motion. PhD thesis, California Institute of Technology. Pasadena, CA.Google Scholar

Sears, W.R. 1941 Some aspects of non-stationary airfoil theory and its practical application. J. Aeronaut. Sci. 8 (3), 104–108.10.2514/8.10655CrossRef Google Scholar

Singh, S.H., van Breugel, F., Rao, R.P.N. & Brunton, B.W. 2023 Emergent behaviour and neural dynamics in artificial agents tracking odour plumes. Nat. Mach. Intell. 5, 58–70.CrossRef Google Scholar PubMed

Su, B. & Gutierrez-Farewik, E.M. 2023 Simulating human walking: a model-based reinforcement learning approach with musculoskeletal modeling. Front. Neurorobot. 17, 1244417.10.3389/fnbot.2023.1244417CrossRef Google Scholar PubMed

Sutton, R.S. & Barto, A.G. 2018 Reinforcement Learning: An Introduction. MIT Press.Google Scholar

Theodorsen, T. 1935 General theory of aerodynamic instability and the mechanism of flutter. NACA Tech. Rep. 496.Google Scholar

Tong, W., Yang, Y. & Wang, S. 2021 Estimating thrust from shedding vortex surfaces in the wake of a flapping plate. J. Fluid Mech. 920, A10.10.1017/jfm.2021.434CrossRef Google Scholar

Uehara, M., Shi, C. & Kallus, N. 2022 A review of off-policy evaluation in reinforcement learning, arXiv:2212.06355.Google Scholar

Van Hasselt, H., Guez, A. & Silver, D. 2016 Deep reinforcement learning with double Q-learning. Proc. AAAI Conf. Artif. Intell 30 (1), 2094–2100.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, I. 2017 Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (ed. I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett), pp. 6000–6010. Curran Associates Inc.Google Scholar

Verma, S., Novati, G. & Koumoutsakos, P. 2018 Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. USA 115 (23), 5849–5854.10.1073/pnas.1800923115CrossRef Google Scholar PubMed

Vignon, C., Rabault, J. & Vinuesa, R. 2023 Recent advances in applying deep reinforcement learning for flow control: perspectives and future directions. Phys. Fluids 35 (3), 031301.10.1063/5.0143913CrossRef Google Scholar

Viquerat, J., Meliga, P., Larcher, A. & Hachem, E. 2022 A review on deep reinforcement learning for fluid mechanics: an update. Phys. Fluids 34 (11), 111301.10.1063/5.0128446CrossRef Google Scholar

Wang, P. 2014 Research on the propulsion mechanism of a noval biomimetic jellyfish. Master’s thesis, Harbin Institute of Technology.Google Scholar

Wang, X. & Wu, Z. 2010 Stroke-averaged lift forces due to vortex rings and their mutual interactions for a flapping flight model. J. Fluid Mech. 654, 453–472.10.1017/S0022112010000613CrossRef Google Scholar

Wang, X. & Wu, Z. 2012 Lift force reduction due to body image of vortex for a hovering flight model. J. Fluid Mech. 709, 648–658.10.1017/jfm.2012.368CrossRef Google Scholar

Wang, Z.Y., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. & Freitas, N. 2016 Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ed. M.F. Balcan & K.Q. Weinberger), pp. 1995–2003. PMLR.Google Scholar

Wölfl, A. et al. 2019 Seafloor mapping – the challenge of a truly global ocean bathymetry. Front. Mar. Sci. 6, 283.CrossRef Google Scholar

Xie, F., Zheng, C., Ji, T., Zhang, X., Bi, R., Zhou, H. & Zheng, Y. 2023 Deep reinforcement learning: a new beacon for intelligent active flow control. Aerosp. Res. Commun. 1, 11130.10.3389/arc.2023.11130CrossRef Google Scholar

Xu, D. & Zhang, M. 2023 Reinforcement-learning-based control of convectively unstable flows. J. Fluid Mech. 954, A37.10.1017/jfm.2022.1020CrossRef Google Scholar

Xu, N.W. & Dabiri, J.O. 2020 Low-power microelectronics embedded in live jellyfish enhance propulsion. Sci. Adv. 6 (5), eaaz3194.10.1126/sciadv.aaz3194CrossRef Google Scholar PubMed

Xu, W., Li, Z., Fang, Z., Wang, B., Hong, L., Yang, G., Han, S., Zhao, X. & Wang, X. 2024 A sub-5mW monolithic CMOS-MEMS thermal flow sensing SoC with ±6 m s⁻¹ linear range. IEEE J. Solid-State Circuit 59 (5), 1486–1496.10.1109/JSSC.2023.3314765CrossRef Google Scholar

Chen and Yang supplementary movie 1

Circular target trajectory tracking. Trajectories of the swimmer’s mass centre (red-blue line) and the moving target (green line), along with the vorticity contour (red for positive values and blue for negative values). The red and blue line segments denote the actions with and without forces, respectively.

File 5.1 MB

Chen and Yang supplementary movie 2

Figure-eight target trajectory tracking. Trajectories of the swimmer’s mass centre (red-blue line) and the moving target (green line), along with the vorticity contour (red for positive values and blue for negative values). The red and blue line segments denote the actions with and without forces, respectively.

File 15.3 MB

Article contents

Deep reinforcement learning for tracking a moving target in jellyfish-like swimming

Abstract

JFM classification

Information

Access options

Article purchase

Temporarily unavailable

References

Chen and Yang supplementary movie 1

Chen and Yang supplementary movie 2

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests