Hostname: page-component-54dcc4c588-b5cpw Total loading time: 0 Render date: 2025-09-19T00:01:02.074Z Has data issue: false hasContentIssue false

A robust visual simultaneous localization and mapping system for dynamic environments without predefined dynamic labels and weighted features

Published online by Cambridge University Press:  17 September 2025

Shuai Xiang
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China
Chaoyi Dong*
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China Engineering Research Center of Large Energy Storage Technology, Ministry of Education, Hohhot, 010010, China
Kang Zhang
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China
Ge Tai
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China
Tianyu Yuan
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China
Haoda Yan
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China
Xiaoyan Chen
Affiliation:
College of Electric Power, Inner Mongolia University of Technology, Hohhot, 010080, China Intelligent Energy Technology and Equipment Engineering Research Centre of Colleges, Universities in Inner Mongolia Autonomous Region, Hohhot, 010051, China Engineering Research Center of Large Energy Storage Technology, Ministry of Education, Hohhot, 010010, China
*
Corresponding author: Chaoyi Dong; Email: dongchaoyi@imut.edu.cn

Abstract

Visual Simultaneous Localization and Mapping (vSLAM) is essentially limited by the static world assumption, which makes its application in dynamic environments challenging. This paper proposes a robust vSLAM system, RFN-SLAM, which is based on ORB-SLAM3 and does not require preset dynamic labels and weighted features to process dynamic scenes. In the feature extraction stage, an enhanced efficient binary image BAD descriptor is used to improve the accuracy of static feature point matching. Through the improved RT-DETR target detection network and FAST-SAM instance segmentation network, RFN-SLAM obtains semantic information and uses a novel dynamic box detection algorithm to identify and eliminate the feature points of dynamic objects. When optimizing the pose, the static feature points are weighted according to the dynamic information, which significantly reduces the mismatch and improves the accuracy of positioning. Meanwhile, 3D rendering of the neural radiation field is used to remove dynamic objects and render them. Experiments were conducted on the TUM RGB-D dataset, Bonn dataset, and self-collected dataset. The results show that in terms of positioning accuracy, RFN-SLAM significantly outperforms ORB-SLAM3 in dynamic environments. It also achieves more accurate positioning than other advanced dynamic SLAM methods and successfully realizes accurate 3D reconstruction of static scenes. In addition, on the premise of ensuring accuracy, the real-time performance of RFN-SLAM is effectively guaranteed.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Wang, X., Fan, X., Shi, P., Ni, J. and Zhou, Z., “An overview of key SLAM technologies for underwater scenes,” Remote Sens-BASEL 15(10), 2496 (2023).Google Scholar
Sharafutdinov, D., Griguletskii, M., Kopanev, P., Kurenkov, M., Ferrer, G., Burkov, A., Gonnochenko, A. and Tsetserukou, D., “Comparison of modern open-source visual SLAM approaches,” J. Intell. Rob. Syst. 107(3), 43 (2023).Google Scholar
Klein, G. and Murray, D., “Parallel Tracking and Mapping for Small AR Workspaces,” 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 13 Nov 2007 (IEEE, 2007) pp. 225234.10.1109/ISMAR.2007.4538852CrossRefGoogle Scholar
Engel, J., Schöps, T. and Cremers, D., “LSD-SLAM: Large-Scale Direct Monocular SLAM,” European Conference on Computer Vision, 6 Sep 2014 (Springer International Publishing, Cham, 2014) pp. 834849.Google Scholar
Sun, Y., Liu, M. and Meng, M. Q., “Improving RGB-D SLAM in dynamic environments: A motion removal approach,” Rob. Auton. Syst. 1(89), 110122 (2017).10.1016/j.robot.2016.11.012CrossRefGoogle Scholar
Mur-Artal, R., Montiel, J. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Rob. 31(5), 11471163 (2015).10.1109/TRO.2015.2463671CrossRefGoogle Scholar
Mur-Artal, R. and Tardós, J. D., “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. Rob. 33(5), 12551262 (2017).10.1109/TRO.2017.2705103CrossRefGoogle Scholar
Campos, C., Elvira, R., Rodríguez, J. J., Montiel, J. M. and Tardós, J. D., “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Trans. Rob. 37(6), 18741890 (2021).CrossRefGoogle Scholar
Zhang, H., Ye, F., Lai, Y., Li, K. and Xu, J., “IQ-VIO: Adaptive visual inertial odometry via interference quantization under dynamic environments,” Intell. Serv. Rob. 16(5), 565581 (2023).Google Scholar
Zhao, X., Zuo, T. and Hu, X., “OFM-SLAM: A visual semantic SLAM for dynamic indoor environments,” Math. Probl. Eng. 2021(1), 5538840 (2021).Google Scholar
Gonzalez, M., Marchand, E., Kacete, A. and Royan, J., “Twistslam: Constrained slam in dynamic environment,” IEEE Rob. Autom. Lett. 7(3), 68466853 (2022).Google Scholar
Bescos, B., Fácil, J. M., Civera, J. and Neira, J., “DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Rob. Autom. Lett. 3(4), 40764083 (2018).10.1109/LRA.2018.2860039CrossRefGoogle Scholar
Yu, C., Liu, Z., Liu, X. J., Xie, F., Yang, Y., Wei, Q. and Fei, Q., “DS-SLAM: A Semantic Visual SLAM Towards Dynamic Environments,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1 Oct 2018 (IEEE, 2018) pp. 11681174.Google Scholar
Pan, Z., Hou, J. and Yu, L., “Optimization RGB-D 3-D reconstruction algorithm based on dynamic SLAM,” IEEE Trans. Instrum. Meas. 23(72), 13 (2023).Google Scholar
Cheng, S., Sun, C., Zhang, S. and Zhang, D., “SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information,” IEEE Trans. Instrum. Meas. 72, 12 (2022).10.1109/TIM.2023.3326234CrossRefGoogle Scholar
Müller, T., Evans, A., Schied, C. and Keller, A., “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graphics (TOG) 41(4), 115 (2022).Google Scholar
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R. and Ng, R., “Nerf: Representing scenes as neural radiance fields for view synthesis,” Commun. Acm 65(1), 99106 (2021).CrossRefGoogle Scholar
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y. and Chen, J., “Detrs Beat Yolos on Real-Time Object Detection,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024) pp. 1696516974.Google Scholar
Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M. and Wang, J., “Fast segment anything. arxiv preprint arxiv:2306.12156 (2023).Google Scholar
Suárez, I., Buenaposada, J. M. and Baumela, L., “Revisiting binary local image description for resource limited devices,” IEEE Rob. Autom. Lett. 6(4), 83178324 (2021).10.1109/LRA.2021.3107024CrossRefGoogle Scholar
Kim, D. H. and Kim, J. H., “Effective background model-based RGB-D dense visual odometry in a dynamic environment,” IEEE Trans. Rob. 32(6), 15651573 (2016).10.1109/TRO.2016.2609395CrossRefGoogle Scholar
Wang, R., Wan, W., Wang, Y. and Di, K., “A new RGB-D SLAM method with moving object detection for dynamic indoor scenes,” Remote Sens-BASEL 11(10), 1143 (2019).10.3390/rs11101143CrossRefGoogle Scholar
Song, B., Yuan, X., Ying, Z., Yang, B., Song, Y. and Zhou, F., “DGM-VINS: Visual–inertial SLAM for complex dynamic environments with joint geometry feature extraction and multiple object tracking,” IEEE Trans. Instrum. Meas. 72, 11 (2023).Google Scholar
Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “Rgb-d slam in dynamic environments using point correlations,” IEEE Trans. Pattern Anal. 44(1), 373389 (2020).10.1109/TPAMI.2020.3010942CrossRefGoogle Scholar
Soares, J. C., Gattass, M. and Meggiolaro, M. A., “Crowd-SLAM: Visual SLAM towards crowded environments using object detection,” J. Intell. Rob. Syst. 102(2), 50 (2021).CrossRefGoogle Scholar
Ji, T., Wang, C. and Xie, L., “Towards Real-Time Semantic RGB-D SLAM in Dynamic Environments,” 2021 IEEE International Conference on Robotics and Automation (ICRA), 30 May 2021 (IEEE) pp. 1117511181.Google Scholar
Zhang, K., Dong, C., Guo, H., Ye, Q., Gao, L., Xiang, S., Chen, X. and Wu, Y., “A semantic visual SLAM based on improved mask R-CNN in dynamic environment,” Robotica 42(10), 35703591 (2024).10.1017/S0263574724001553CrossRefGoogle Scholar
Yu, X., Zheng, W. and Ou, L., “CPR-SLAM: RGB-D SLAM in dynamic environment using sub-point cloud correlations,” Robotica 42(7), 23672387 (2024).Google Scholar
Kenye, L. and Kala, R., “Improving RGB-D SLAM in dynamic environments using semantic aided segmentation,” Robotica 40(6), 20652090 (2022).CrossRefGoogle Scholar
Li, S., Gu, J., Li, Z., Li, S., Guo, B., Gao, S., Zhao, F., Yang, Y., Li, G. and Dong, L., “A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot,” Robotica 42(7), 21692183 (2024).10.1017/S0263574724000511CrossRefGoogle Scholar
Fan, Y., Zhang, Q., Tang, Y., Liu, S. and Han, H., “Blitz-SLAM: A semantic SLAM in dynamic environments,” Pattern Recognit. 121, 108225 (2022).10.1016/j.patcog.2021.108225CrossRefGoogle Scholar
He, J., Li, M., Wang, Y. and Wang, H., “OVD-SLAM: An online visual SLAM for dynamic environments,” IEEE Sens. J. 23(12), 1321013219 (2023).10.1109/JSEN.2023.3270534CrossRefGoogle Scholar
Du, Z. J., Huang, S. S., Mu, T. J., Zhao, Q., Martin, R. R. and Xu, K., “Accurate dynamic SLAM using CRF-based long-term consistency,” IEEE Trans. Visual. Comput. Graphics 28(4), 17451757 (2020).10.1109/TVCG.2020.3028218CrossRefGoogle Scholar
Zhang, Q. and Li, C., “Semantic SLAM for mobile robots in dynamic environments based on visual camera sensors,” Meas. Sci. Technol. 34(8), 085202 (2023).Google Scholar
Li, S. and Lee, D., “RGB-D SLAM in dynamic environments using static point weighting,” IEEE Rob. Autom. Lett. 2(4), 22632270 (2017).CrossRefGoogle Scholar
Zhong, Y., Hu, S., Huang, G., Bai, L. and Li, Q., “WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features,” IEEE Sens. J. 22(11), 1081810827 (2022).10.1109/JSEN.2022.3169340CrossRefGoogle Scholar
Zhang, J., Henein, M., Mahony, R. and Ila, V., “VDO-SLAM: A visual dynamic object-aware SLAM system.” arxiv preprint arxiv:2005.11052 (2020).Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T. and Quan, L., “Mvsnet: Depth Inference for Unstructured Multi-view Stereo,” Proceedings of the European Conference on Computer Vision (ECCV) (2018) pp. 767783.Google Scholar
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T. and Quan, L., “Recurrent MVSNet for High-Resolution Multi-view Stereo Depth Inference,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) pp. 55255534.Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T. and Drettakis, G., “3d gaussian splatting for real-time radiance field rendering,” ACM Trans. Graph 42(4), 139–131 (2023).10.1145/3592433CrossRefGoogle Scholar
Liu, X., Peng, H., Zheng, N., Yang, Y., Hu, H. and Yuan, Y., “Efficientvit: Memory Efficient Vision Transformer with Cascaded Group Attention,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) pp. 1442014430.Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W. and Cremers, D., “A Benchmark for the Evaluation of RGB-D SLAM Systems,” 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2012) pp. 573580.10.1109/IROS.2012.6385773CrossRefGoogle Scholar
Palazzolo, E., Behley, J., Lottes, P., Giguere, P. and Stachniss, C., “ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2019) pp. 78557862.10.1109/IROS40897.2019.8967590CrossRefGoogle Scholar