Hostname: page-component-76c49bb84f-qtd2s Total loading time: 0 Render date: 2025-07-05T19:53:53.048Z Has data issue: false hasContentIssue false

A dynamic SLAM system with YOLOv7 segmentation and geometric constraints for indoor environments

Published online by Cambridge University Press:  30 June 2025

Yewei Shen
Affiliation:
School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
Xinguang Zhang*
Affiliation:
School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
*
Corresponding author: Xinguang Zhang; Email: jixieyuan123456@163.com

Abstract

With the rapid advancements in robotics and autonomous driving, SLAM (simultaneous localization and mapping) has become a crucial technology for real-time localization and map creation, seeing widespread application across various domains. However, SLAM’s performance in dynamic environments is often compromised due to the presence of moving objects, which can introduce errors and inconsistencies in localization and mapping. To overcome these challenges, this paper presents a visual SLAM system that employs dynamic feature point rejection. The system leverages a lightweight YOLOv7 model for detecting dynamic objects and performing semantic segmentation. Additionally, it incorporates optical flow tracking and multiview geometry techniques to identify and eliminate dynamic feature points. This approach effectively mitigates the impact of dynamic objects on the SLAM process, while maintaining the integrity of static feature points, ultimately enhancing the system’s robustness and accuracy in dynamic environments. Finally, we evaluate our method on the TUM RGB-D dataset and in real-world scenarios. The experimental results demonstrate that our approach significantly reduces both the root mean square error (RMSE) and standard deviation (Std) compared to the ORB-SLAM2 algorithm.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bresson, G., Alsayed, Z., Yu, L. and Glaser, S., “Simultaneous localization and mapping: A survey of current trends in autonomous driving,” IEEE Trans. Intell. Veh. 2(3), 194220 (2017).10.1109/TIV.2017.2749181CrossRefGoogle Scholar
Chen, Z., Sheng, W., Yang, G., Su, Z. and Liang, B., “Comparison and Analysis of Feature Method and Direct Method in Visual Slam Technology for Social Robots,” In: 2018 13th World Congress on Intelligent Control and Automation (WCICA) 2018 Jul 4, IEEE (2018) pp. 413417.10.1109/WCICA.2018.8630714CrossRefGoogle Scholar
Mur-Artal, R., Montiel, J. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot. 31(5), 11471163 (2015).10.1109/TRO.2015.2463671CrossRefGoogle Scholar
Mur-Artal, R. and Tardós, J. D., “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. Robot. 33(5), 12551262 (2017).10.1109/TRO.2017.2705103CrossRefGoogle Scholar
Campos, C., Elvira, R., Rodríguez, J. J., Montiel, J. M. and Tardós, J. D., “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Trans. Robot. 37(6), 18741890 (2021).10.1109/TRO.2021.3075644CrossRefGoogle Scholar
Rublee, E., Rabaud, V., Konolige, K. and Bradski, G., “ORB: An Efficient Alternative to SIFT or SURF,” In: 2011 International conference on computer vision 2011 Nov 6, Ieee (2011) pp.25642571.Google Scholar
Engel, J., Schöps, T. and Cremers, D., “LSD-SLAM: Large-Scale Direct Monocular SLAM,” In: European conference on computer vision 2014 Sep 6, Cham, Springer International Publishing (2014) pp. 834849.Google Scholar
Forster, C., Pizzoli, M. and Scaramuzza, D., “SVO: Fast Semi-direct Monocular Visual Odometry,” In: 2014 IEEE international conference on robotics and automation (ICRA) 2014 May 31, IEEE (2014) pp. 1522.Google Scholar
Newcombe, R. A., Lovegrove, S. J. and Davison, A. J., “DTAM: Dense Tracking and Mappping In Real-Time,” In: 2011 international conference on computer vision 2011 Nov 6, IEEE (2011) pp. 23202327.Google Scholar
Macario Barros, A., Michel, M., Moline, Y., Corre, G. and Carrel, F., “A comprehensive survey of visual slam algorithms,” Robotics 11(1), 24 (2022).10.3390/robotics11010024CrossRefGoogle Scholar
Saputra, M. R., Markham, A. and Trigoni, N., “Visual SLAM and structure from motion in dynamic environments: A survey,” ACM Computing Surveys (CSUR) 51(2), 136 (2018).10.1145/3177853CrossRefGoogle Scholar
Wu, W., Guo, L., Gao, H., You, Z., Liu, Y. and Chen, Z., “YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint,” Neural Comput. Appl. 34(8), 60116026 (2022).10.1007/s00521-021-06764-3CrossRefGoogle Scholar
Kan, X., Shi, G., Yang, X. and Hu, X., “YPR-SLAM: A SLAM system combining object detection and geometric constraints for dynamic scenes,” Sensors (Basel, Switzerland) 24(20), 6576 (2024). doi: 10.3390/s24206576.CrossRefGoogle ScholarPubMed
Wang, C. Y., Bochkovskiy, A. and Liao, H. Y., “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-art For Real-time Object Detectors,” In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023, IEEE (2023) pp. 74647475.Google Scholar
Bescos, B., Fácil, J. M., Civera, J. and DynaSLAM, N. J., “Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robot. Autom. Lett. 3(4), 40764083 (2018).10.1109/LRA.2018.2860039CrossRefGoogle Scholar
Sturm, P., “Multi-View Geometry for General Camera Models,” In. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 2005 Jun 20, IEEE (2005) vol. 1, pp. 206212.Google Scholar
Al-Tawil, B., Hempel, T., Abdelrahman, A. and Al-Hamadi, A., “A review of visual SLAM for robotics: Evolution, properties, and future applications,” Front. Robot. AI 11, 1347985 (2024). doi: 10.3389/frobt.2024.1347985.CrossRefGoogle ScholarPubMed
Kim, D. H. and Kim, J. H., “Effective background model-based RGB-D dense visual odometry in a dynamic environment,” IEEE Trans. Robot. 32(6), 15651573 (2016).10.1109/TRO.2016.2609395CrossRefGoogle Scholar
Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “Rgb-d slam in dynamic environments using point correlations,” IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 373389 (2020).10.1109/TPAMI.2020.3010942CrossRefGoogle Scholar
Wang, R., Wan, W., Wang, Y. and Di, K., “A new RGB-D SLAM method with moving object detection for dynamic indoor scenes,” Remote Sens.-BASEL 11(10), 1143 (2019).10.3390/rs11101143CrossRefGoogle Scholar
Xu, G., Yu, Z., Xing, G., Zhang, X. and Pan, F., “Visual odometry algorithm based on geometric prior for dynamic environments,” Int. J. Adv. Manuf. Tech. 122(1), 235242 (2022).10.1007/s00170-022-09219-8CrossRefGoogle Scholar
Li, F., Chen, W., Xu, W., Huang, L., Li, D., Cai, S., Yang, M., Xiong, X., Liu, Y. and Li, W., “A mobile robot visual SLAM system with enhanced semantics segmentation,” IEEE Access 29, 2544225458 (2020).10.1109/ACCESS.2020.2970238CrossRefGoogle Scholar
Badrinarayanan, V., Kendall, A. and Cipolla, R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 24812495 (2017).10.1109/TPAMI.2016.2644615CrossRefGoogle ScholarPubMed
Vincent, J., Labbé, M., Lauzon, J. S., Grondin, F., Comtois-Rivet, P. M. and Michaud, F., “Dynamic Object Tracking and Masking for Visual SLAM,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020 Oct 24, IEEE (2020) pp. 49744979.Google Scholar
Bolya, D., Zhou, C., Xiao, F. and Lee, Y. J., “Yolact: Real-Time Instance Segmentation,” In: Proceedings of the IEEE/CVF international conference on computer vision 2019, IEEE (2019) pp. 91579166.Google Scholar
Labbé, M. and Michaud, F., “RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation,” J. Field. Robot. 36(2), 416446 (2019).10.1002/rob.21831CrossRefGoogle Scholar
Guo, L., Lei, Y., Li, N., Yan, T. and Li, N., “Machinery health indicator construction based on convolutional neural networks considering trend burr,” Neurocomputing 31, 142150 (2018).10.1016/j.neucom.2018.02.083CrossRefGoogle Scholar
Tan, Y., Guo, L., Gao, H. and Zhang, L., “Deep coupled joint distribution adaptation network: A method for intelligent fault diagnosis between artificial and real damages,” IEEE T. Instrum. Meas. 9, 12 (2020).Google Scholar
He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask r-cnn,” In: Proceedings of the IEEE international conference on computer vision 2017, IEEE (2017) pp. 29612969.Google Scholar
Yu, C., Liu, Z., Liu, X. J., Xie, F., Yang, Y., Wei, Q. and Fei, Q., “DS-SLAM: A Semantic Visual SLAM Towards Dynamic Environments,” In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) 2018 Oct 1, IEEE (2018) pp. 11681174.Google Scholar
Liu, Y. and Miura, J., “RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods,” IEEE Access 9, 2377223785 (2021).10.1109/ACCESS.2021.3050617CrossRefGoogle Scholar
Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., Michael, K., Liu, C., Abhiram, V. and Skalski, T., “ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support,” [Software], Zenodo, Oct 12, 2021. doi: 10.5281/zenodo.5563715.CrossRefGoogle Scholar
Huang, J., Xia, Q. and Kang, L., “TKG-SLAM: Dynamic SLAM Based on Target Tracking And Multi-view Geometry,” In: 2024 3rd International Conference on Innovations and Development of Information Technologies and Robotics (IDITR) 2024 May 23, IEEE (2024) pp. 105111.Google Scholar
Zhong, M., Hong, C., Jia, Z., Wang, C. and Wang, Z., “DynaTM-SLAM: Fast filtering of dynamic feature points and object-based localization in dynamic indoor environments,” Robot. Auton. Syst. 174, 104634 (2024). doi: 10.1016/j.robot.2024.104634.CrossRefGoogle Scholar
Olorunshola, O. E., Irhebhude, M. E. and Evwiekpaefe, A. E., “A comparative study of YOLOv5 and YOLOv7 object detection algorithms,” J. Comput. Soc. Inform. 2(1), 12 (2023).10.33736/jcsi.5070.2023CrossRefGoogle Scholar
Winata, I. and Oh, J., “Lightweight extraction and segmentation with ghost convolutional and attention module integration for visual SLAM,” Int. J. Cont., Autom. Syst. 22(12), 35953605 (2024).10.1007/s12555-024-0529-5CrossRefGoogle Scholar
Woo, S., Park, J., Lee, J. Y. and Kweon, I. S., “Cbam: Convolutional Block Attention Module,” In: Proceedings of the European conference on computer vision (ECCV) 2018, Cham, Springer International Publishing (2018) pp. 319.Google Scholar
Liu, G., Zeng, W., Feng, B. and Xu, F., “DMS-SLAM: A general visual SLAM system for dynamic scenes with multiple sensors,” Sensors 19(17), 3714 (2019).10.3390/s19173714CrossRefGoogle ScholarPubMed
Xiao, L., Wang, J., Qiu, X., Rong, Z. and Dynamic-SLAM, Z. X., “Semantic monocular visual localization and mapping based on deep learning in dynamic environment,” Robot. Auton. Syst. 117, 116 (2019).10.1016/j.robot.2019.03.012CrossRefGoogle Scholar
Cheng, S., Sun, C., Zhang, S. and Zhang, D., “SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information,” IEEE Trans. Instrum. Meas. 72, 12 (2022).10.1109/TIM.2023.3326234CrossRefGoogle Scholar
Geiger, A., Lenz, P. and Urtasun, R., “Are We Ready For Autonomous Driving? The Kitti Vision Benchmark Suite,” In: 2012 IEEE conference on computer vision and pattern recognition 2012 Jun 16, IEEE (2012) pp. 33543361.Google Scholar
Sturm, J., Engelhard, N., Kerl, F., Cremers, D. and Burgard, W., “A Benchmark for the Evaluation of RGB-D SLAM Systems,” In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems 2012 Oct 7, IEEE (2012) pp. 573580.Google Scholar
Grupp M, “evo: Python package for the evaluation of odometry and slam,” [Software], GitHub, 2017. Available at: https://github.com/MichaelGrupp/evo.Google Scholar