Hostname: page-component-cb9f654ff-rkzlw Total loading time: 0 Render date: 2025-08-24T21:28:55.145Z Has data issue: false hasContentIssue false

Deep reinforcement learning for tracking a moving target in jellyfish-like swimming

Published online by Cambridge University Press:  15 August 2025

Yihao Chen
Affiliation:
State Key Laboratory for Turbulence and Complex Systems, School of Mechanics and Engineering Science, Peking University, Beijing 100871, PR China
Yue Yang*
Affiliation:
HEDPS-CAPT, Peking University, Beijing 100871, PR China
*
Corresponding author: Yue Yang, yyg@pku.edu.cn

Abstract

A deep reinforcement learning method for training a jellyfish-like swimmer to effectively track a moving target in a two-dimensional flow was developed. This swimmer is a flexible object equipped with a muscle model based on torsional springs. We employed a deep Q-network (DQN) that takes the swimmer’s geometry and dynamic parameters as inputs, and outputs actions that are the forces applied to the swimmer. In particular, an action regulation was introduced to mitigate the interference from complex fluid–structure interactions. The goal of these actions is to navigate the swimmer to a target point in the shortest possible time. In the DQN training, the data on the swimmer’s motions were obtained from simulations using the immersed boundary method. During tracking a moving target, there is an inherent delay between the application of forces and the corresponding response of the swimmer’s body due to hydrodynamic interactions between the shedding vortices and the swimmer’s own locomotion. Our tests demonstrate that the swimmer, with the DQN agent and action regulation, is able to dynamically adjust its course based on its instantaneous state. This work extends the application scope of machine learning in controlling flexible objects within fluid environments.

Information

Type
JFM Papers
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A.C. & Bellemare, M. 2021 Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual (ed. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang & J. Wortman Vaughan), vol. 34, pp. 2930429320. Curran Associates, Inc. Google Scholar
Bae, H.J. & Koumoutsakos, P. 2022 Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun. 13, 1443.10.1038/s41467-022-28957-7CrossRefGoogle ScholarPubMed
Battista, N.A., Strickland, W.C. & Miller, L.A. 2017 IB2d : a Python and MATLAB implementation of the immersed boundary method. Bioinspir. Biomim. 12 (3), 036003.CrossRefGoogle ScholarPubMed
Becker-Ehmck, P., Karl, M., Peters, J. & van der Smagt, P. 2020 Learning to fly via deep model-based reinforcement learning. arXiv:2003.08876.Google Scholar
Bishop, C.M. 2006 Pattern Recognition and Machine Learning. Springer.Google Scholar
Brunton, S.L., Noack, B.R. & Koumoutsakos, P. 2020 Machine learning for fluid mechanics. Annu. Rev. Fluid Mech. 52, 477508.10.1146/annurev-fluid-010719-060214CrossRefGoogle Scholar
Brunton, S.L., Rowley, C.W. & Williams, D.R. 2013 Reduced-order unsteady aerodynamic models at low Reynolds numbers. J. Fluid Mech. 724, 203233.CrossRefGoogle Scholar
Chatzimanolakis, M., Weber, P. & Koumoutsakos, P. 2024 Drag reduction in flows past 2D and 3D circular cylinders through deep reinforcement learning. Phys. Rev. Fluids 9 (4), 043902.10.1103/PhysRevFluids.9.043902CrossRefGoogle Scholar
Chen, G., Lu, Y., Yang, X. & Hu, H. 2022 Reinforcement learning control for the swimming motions of a beaver-like, single-legged robot based on biological inspiration. Robot. Auton. Syst. 154, 104116.CrossRefGoogle Scholar
Chen, N., Zhang, R., Liu, Q. & Ding, Z. 2024 Deep reinforcement learning-based active control for drag reduction of three equilateral-triangular circular cylinders. Eur. J. Mech. B/Fluids 104, 114122.10.1016/j.euromechflu.2023.12.001CrossRefGoogle Scholar
Costello, J.H., Colin, S.P., Dabiri, J.O., Gemmell, B.J., Lucas, K.N. & Sutherland, K.R. 2021 The hydrodynamics of jellyfish swimming. Annu. Rev. Mar. Sci. 13, 375396.CrossRefGoogle ScholarPubMed
d’Apolito, F. & Sulzbachner, C. 2021 Flight control of a multicopter using reinforcement learning. IFAC-PapersOnLine 54 (13), 251255 10.1016/j.ifacol.2021.10.454CrossRefGoogle Scholar
DeVries, T. et al. 2019 Decadal trends in the ocean carbon sink. Proc. Natl Acad. Sci. USA 116 (24), 1164611651.10.1073/pnas.1900371116CrossRefGoogle ScholarPubMed
Dular, M., Bajcar, T. & Širok, B. 2009 Numerical investigation of flow in the vicinity of a swimming jellyfish. Engng Applics. Comput. Fluid Mech. 3, 258270.Google Scholar
Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S. & Karniadakis, G.E. 2020 Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. USA 117 (42), 2609126098.10.1073/pnas.2004939117CrossRefGoogle ScholarPubMed
Frame, J., Lopez, N., Curet, O. & Engeberg, E.D. 2018 Thrust force characterization of free-swimming soft robotic jellyfish. Bioinspir. Biomim. 13 (6), 064001.10.1088/1748-3190/aadcb3CrossRefGoogle ScholarPubMed
Garnier, P., Viquerat, J., Rabault, J., Larcher, A., Kuhnle, A. & Hachem, E. 2021 A review on deep reinforcement learning for fluid mechanics. Comput. Fluids 225, 104973.10.1016/j.compfluid.2021.104973CrossRefGoogle Scholar
Gazzola, M., Tchieu, A.A., Alexeev, D., de Brauer, A. & Koumoutsakos, P. 2016 Learning to school in the presence of hydrodynamic interactions. J. Fluid Mech. 789, 726749.10.1017/jfm.2015.686CrossRefGoogle Scholar
Gazzola, M., Van Rees, W.M. & Koumoutsakos, P. 2012 C-start: optimal start of larval fish. J. Fluid Mech. 698, 518.10.1017/jfm.2011.558CrossRefGoogle Scholar
Gemmell, B.J., Costello, J.H., Colin, S.P., Stewart, C.J., Dabiri, J.O., Tafti, D. & Priya, S. 2013 Passive energy recapture in jellyfish contributes to propulsive advantage over other metazoans. Proc. Natl Acad. Sci. USA 110 (44), 1790417909.CrossRefGoogle ScholarPubMed
Griffith, B.E., Hornung, R.D., McQueen, D.M. & Peskin, C.S. 2007 An adaptive, formally second order accurate version of the immersed boundary method. J. Comput. Phys. 223 (1), 1049.10.1016/j.jcp.2006.08.019CrossRefGoogle Scholar
Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G. & Levine, S. 2019 a Learning to walk via deep reinforcement learning. In Proceedings of Robotics: Science and Systems (ed. S. Hutchinson, A. Bicchi & H. Kress-Gazit). MIT Press.Google Scholar
Haarnoja, T. et al. 2019 b Soft actor–critic algorithms and applications. arXiv:1812.05905.Google Scholar
Han, B., Huang, W. & Xu, C. 2022 Deep reinforcement learning for active control of flow over a circular cylinder with rotational oscillations. Intl J. Heat Fluid Flow 96, 109008.CrossRefGoogle Scholar
Hochreiter, S. & Schmidhuber, J. 1997 Long short-term memory. Neural Comput. 9 (8), 17351780.10.1162/neco.1997.9.8.1735CrossRefGoogle ScholarPubMed
Hoover, A., Griffith, B. & Miller, L. 2017 Quantifying performance in the medusan mechanospace with an actively swimming three-dimensional jellyfish model. J. Fluid Mech. 813, 11121155.10.1017/jfm.2017.3CrossRefGoogle Scholar
Hoover, A. & Miller, L. 2015 A numerical study of the benefits of driving jellyfish bells at their natural frequency. J. Theor. Biol. 374, 1325.10.1016/j.jtbi.2015.03.016CrossRefGoogle ScholarPubMed
Hoover, A., Xu, N., Gemmell, B., Colin, S., Costello, J., Dabiri, J. & Miller, L. 2021 Neuromechanical wave resonance in jellyfish swimming. Proc. Natl Acad. Sci. USA 118, e2020025118.10.1073/pnas.2020025118CrossRefGoogle ScholarPubMed
Hornik, K., Stinchcombe, M. & White, H. 1989 Multilayer feedforward networks are universal approximators. Neural Netw. 2 (5), 359366.CrossRefGoogle Scholar
Huang, R., Wu, J., Jeng, J. & Chen, R. 2001 Surface flow and vortex shedding of an impulsively started wing. J. Fluid Mech. 441, 265292.10.1017/S002211200100489XCrossRefGoogle Scholar
Itahashi, N., Itoh, H., Fukumoto, H. & Wakuya, H. 2024 Reinforcement learning of bipedal walking using a simple reference motion. Appl. Sci. 14 (5), 1803.10.3390/app14051803CrossRefGoogle Scholar
Joshi, K.B. 2012 Modeling of bio-inspired jellyfish vehicle for energy efficient propulsion. PhD thesis, Virginia Polytechnic Institute and State University.Google Scholar
Kang, L., Gao, A., Han, F., Cui, W. & Lu, X. 2023 Propulsive performance and vortex dynamics of jellyfish-like propulsion with burst-and-coast strategy. Phys. Fluids 35 (9), 091904.10.1063/5.0160878CrossRefGoogle Scholar
Karniadakis, G.E., Kevrekidis, Y., Lu, L., Perdikaris, P., Wang, S. & Yang, L. 2021 Physics-informed machine learning. Nat. Rev. Phys. 3, 422440.10.1038/s42254-021-00314-5CrossRefGoogle Scholar
Katsuki, T. & Greenspan, R.J. 2013 Jellyfish nervous systems. Curr. Biol. 23 (14), 592594.10.1016/j.cub.2013.03.057CrossRefGoogle ScholarPubMed
Konda, V. & Tsitsiklis, J. 1999 Actor–critic algorithms. In Advances in Neural Information Processing Systems (ed. S. Solla, T. Leen & K. Müller), pp. 10081014. MIT Press.Google Scholar
Kubo, A. & Shimizu, M. 2022 Efficient reinforcement learning with partial observables for fluid flow control. Phys. Rev. E 105, 065101.10.1103/PhysRevE.105.065101CrossRefGoogle ScholarPubMed
Li, G., Ashraf, I., François, B., Kolomenskiy, D., Lechenault, F., Godoy-Diana, R. & Thiria, B. 2021 Burst-and-coast swimmers optimize gait by adapting unique intrinsic cycle. Commun. Biol. 4, 40.CrossRefGoogle ScholarPubMed
Li, J. & Zhang, M. 2022 Reinforcement-learning-based control of confined cylinder wakes with stability analyses. J. Fluid Mech. 932, A44.CrossRefGoogle Scholar
Liu, Q., An, B., Nohmi, M., Obuchi, M. & Taira, K. 2020 Active flow control of a pump-induced wall-normal vortex with steady blowing. J. Fluids Engng 142 (8), 081202.10.1115/1.4046692CrossRefGoogle Scholar
Loisy, A. & Eloy, C. 2022 Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A. 478 (2262), 20220118.Google Scholar
Matharu, P., Gong, P., Guntaka, K.P.R., Almubarak, Y., Jin, Y. & Tadesse, Y. 2023 Jelly-Z: swimming performance and analysis of twisted and coiled polymer (TCP) actuated jellyfish soft robot. Sci. Rep. 13, 11086.CrossRefGoogle ScholarPubMed
Miles, J.G. & Battista, N.A. 2019 a Don’t be jelly: exploring effective jellyfish locomotion. arXiv:1904.09340.Google Scholar
Miles, J.G. & Battista, N.A. 2019 b Naut your everyday jellyfish model: exploring how tentacles and oral arms impact locomotion. Fluids 4 (3), 169212.10.3390/fluids4030169CrossRefGoogle Scholar
Mirzakhanloo, M., Esmaeilzadeh, S. & Alam, M. 2020 Active cloaking in Stokes flows via reinforcement learning. J. Fluid Mech. 903, A34.CrossRefGoogle Scholar
Mnih, V. et al. 2015 Human-level control through deep reinforcement learning. Nature 518 (7540), 529533.CrossRefGoogle ScholarPubMed
Pallasdies, F., Goedeke, S., Braun, W. & Memmesheimer, R. 2019 From single neurons to behavior in the jellyfish Aurelia aurita . eLife 8, e50084.CrossRefGoogle ScholarPubMed
Peskin, C.S. 2002 The immersed boundary method. Acta Numer. 11, 479517.10.1017/S0962492902000077CrossRefGoogle Scholar
Qiu, J., Huang, W., Xu, C. & Zhao, L. 2020 Swimming strategy of settling elongated micro-swimmers by reinforcement learning. Sci. China: Phys. Mech. Astron. 63, 284711.Google Scholar
Reaka-Kudla, M.L. 2001 Known and unknown biodiversity, risk of extinction and conservation strategy in the Sea. In Waters in Peril, pp. 1933. Springer US.CrossRefGoogle Scholar
Reddy, G., Celani, A., Sejnowski, T.J. & Vergassola, M. 2016 Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113 (33), E4877E4884.CrossRefGoogle ScholarPubMed
Renn, P.I. & Gharib, M. 2022 Machine learning for flow-informed aerodynamic control in turbulent wind conditions. Commun. Engng 1, 45.10.1038/s44172-022-00046-zCrossRefGoogle Scholar
Rodwell, C. & Tallapragada, P. 2023 Physics-informed reinforcement learning for motion control of a fish-like swimming robot. Sci. Rep. 13, 10754.CrossRefGoogle ScholarPubMed
Rosenblatt, F. 1962 Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books.Google Scholar
Rumelhart, D.E., Hinton, G.E. & Williams, R.J. 1986 Learning representations by back-propagating errors. Nature 323, 533536.10.1038/323533a0CrossRefGoogle Scholar
Sears, W.R. 1938 A systematic presentation of the theory of thin airfoils in non-uniform motion. PhD thesis, California Institute of Technology. Pasadena, CA.Google Scholar
Sears, W.R. 1941 Some aspects of non-stationary airfoil theory and its practical application. J. Aeronaut. Sci. 8 (3), 104108.10.2514/8.10655CrossRefGoogle Scholar
Singh, S.H., van Breugel, F., Rao, R.P.N. & Brunton, B.W. 2023 Emergent behaviour and neural dynamics in artificial agents tracking odour plumes. Nat. Mach. Intell. 5, 5870.CrossRefGoogle ScholarPubMed
Su, B. & Gutierrez-Farewik, E.M. 2023 Simulating human walking: a model-based reinforcement learning approach with musculoskeletal modeling. Front. Neurorobot. 17, 1244417.10.3389/fnbot.2023.1244417CrossRefGoogle ScholarPubMed
Sutton, R.S. & Barto, A.G. 2018 Reinforcement Learning: An Introduction. MIT Press.Google Scholar
Theodorsen, T. 1935 General theory of aerodynamic instability and the mechanism of flutter. NACA Tech. Rep. 496.Google Scholar
Tong, W., Yang, Y. & Wang, S. 2021 Estimating thrust from shedding vortex surfaces in the wake of a flapping plate. J. Fluid Mech. 920, A10.10.1017/jfm.2021.434CrossRefGoogle Scholar
Uehara, M., Shi, C. & Kallus, N. 2022 A review of off-policy evaluation in reinforcement learning, arXiv:2212.06355.Google Scholar
Van Hasselt, H., Guez, A. & Silver, D. 2016 Deep reinforcement learning with double Q-learning. Proc. AAAI Conf. Artif. Intell 30 (1), 20942100.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, I. 2017 Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (ed. I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett), pp. 60006010. Curran Associates Inc.Google Scholar
Verma, S., Novati, G. & Koumoutsakos, P. 2018 Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. USA 115 (23), 58495854.10.1073/pnas.1800923115CrossRefGoogle ScholarPubMed
Vignon, C., Rabault, J. & Vinuesa, R. 2023 Recent advances in applying deep reinforcement learning for flow control: perspectives and future directions. Phys. Fluids 35 (3), 031301.10.1063/5.0143913CrossRefGoogle Scholar
Viquerat, J., Meliga, P., Larcher, A. & Hachem, E. 2022 A review on deep reinforcement learning for fluid mechanics: an update. Phys. Fluids 34 (11), 111301.10.1063/5.0128446CrossRefGoogle Scholar
Wang, P. 2014 Research on the propulsion mechanism of a noval biomimetic jellyfish. Master’s thesis, Harbin Institute of Technology.Google Scholar
Wang, X. & Wu, Z. 2010 Stroke-averaged lift forces due to vortex rings and their mutual interactions for a flapping flight model. J. Fluid Mech. 654, 453472.10.1017/S0022112010000613CrossRefGoogle Scholar
Wang, X. & Wu, Z. 2012 Lift force reduction due to body image of vortex for a hovering flight model. J. Fluid Mech. 709, 648658.10.1017/jfm.2012.368CrossRefGoogle Scholar
Wang, Z.Y., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. & Freitas, N. 2016 Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ed. M.F. Balcan & K.Q. Weinberger), pp. 19952003. PMLR.Google Scholar
Wölfl, A. et al. 2019 Seafloor mapping – the challenge of a truly global ocean bathymetry. Front. Mar. Sci. 6, 283.CrossRefGoogle Scholar
Xie, F., Zheng, C., Ji, T., Zhang, X., Bi, R., Zhou, H. & Zheng, Y. 2023 Deep reinforcement learning: a new beacon for intelligent active flow control. Aerosp. Res. Commun. 1, 11130.10.3389/arc.2023.11130CrossRefGoogle Scholar
Xu, D. & Zhang, M. 2023 Reinforcement-learning-based control of convectively unstable flows. J. Fluid Mech. 954, A37.10.1017/jfm.2022.1020CrossRefGoogle Scholar
Xu, N.W. & Dabiri, J.O. 2020 Low-power microelectronics embedded in live jellyfish enhance propulsion. Sci. Adv. 6 (5), eaaz3194.10.1126/sciadv.aaz3194CrossRefGoogle ScholarPubMed
Xu, W., Li, Z., Fang, Z., Wang, B., Hong, L., Yang, G., Han, S., Zhao, X. & Wang, X. 2024 A sub-5mW monolithic CMOS-MEMS thermal flow sensing SoC with ±6 m s−1 linear range. IEEE J. Solid-State Circuit 59 (5), 14861496.10.1109/JSSC.2023.3314765CrossRefGoogle Scholar
Supplementary material: File

Chen and Yang supplementary movie 1

Circular target trajectory tracking. Trajectories of the swimmer’s mass centre (red-blue line) and the moving target (green line), along with the vorticity contour (red for positive values and blue for negative values). The red and blue line segments denote the actions with and without forces, respectively.
Download Chen and Yang supplementary movie 1(File)
File 5.1 MB
Supplementary material: File

Chen and Yang supplementary movie 2

Figure-eight target trajectory tracking. Trajectories of the swimmer’s mass centre (red-blue line) and the moving target (green line), along with the vorticity contour (red for positive values and blue for negative values). The red and blue line segments denote the actions with and without forces, respectively.
Download Chen and Yang supplementary movie 2(File)
File 15.3 MB