A dynamic SLAM system with YOLOv7 segmentation and geometric constraints for indoor environments

Yewei Shen; Xinguang Zhang

doi:10.1017/S0263574725101823

A dynamic SLAM system with YOLOv7 segmentation and geometric constraints for indoor environments

Published online by Cambridge University Press: 30 June 2025

Yewei Shen and

Xinguang Zhang

Show author details

Yewei Shen: Affiliation:
School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
Xinguang Zhang*: Affiliation:
School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
*: Corresponding author: Xinguang Zhang; Email: jixieyuan123456@163.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

With the rapid advancements in robotics and autonomous driving, SLAM (simultaneous localization and mapping) has become a crucial technology for real-time localization and map creation, seeing widespread application across various domains. However, SLAM’s performance in dynamic environments is often compromised due to the presence of moving objects, which can introduce errors and inconsistencies in localization and mapping. To overcome these challenges, this paper presents a visual SLAM system that employs dynamic feature point rejection. The system leverages a lightweight YOLOv7 model for detecting dynamic objects and performing semantic segmentation. Additionally, it incorporates optical flow tracking and multiview geometry techniques to identify and eliminate dynamic feature points. This approach effectively mitigates the impact of dynamic objects on the SLAM process, while maintaining the integrity of static feature points, ultimately enhancing the system’s robustness and accuracy in dynamic environments. Finally, we evaluate our method on the TUM RGB-D dataset and in real-world scenarios. The experimental results demonstrate that our approach significantly reduces both the root mean square error (RMSE) and standard deviation (Std) compared to the ORB-SLAM2 algorithm.

Keywords

visual SLAM YOLOv7 semantic segmentation optical flow tracking

Information

Type: Research Article
Information: Robotica , First View , pp. 1 - 19

DOI: https://doi.org/10.1017/S0263574725101823 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bresson, G., Alsayed, Z., Yu, L. and Glaser, S., “Simultaneous localization and mapping: A survey of current trends in autonomous driving,” IEEE Trans. Intell. Veh. 2(3), 194–220 (2017).10.1109/TIV.2017.2749181CrossRef Google Scholar

Chen, Z., Sheng, W., Yang, G., Su, Z. and Liang, B., “Comparison and Analysis of Feature Method and Direct Method in Visual Slam Technology for Social Robots,” In: 2018 13th World Congress on Intelligent Control and Automation (WCICA) 2018 Jul 4, IEEE (2018) pp. 413–417.10.1109/WCICA.2018.8630714CrossRef Google Scholar

Mur-Artal, R., Montiel, J. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot. 31(5), 1147–1163 (2015).10.1109/TRO.2015.2463671CrossRef Google Scholar

Mur-Artal, R. and Tardós, J. D., “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. Robot. 33(5), 1255–1262 (2017).10.1109/TRO.2017.2705103CrossRef Google Scholar

Campos, C., Elvira, R., Rodríguez, J. J., Montiel, J. M. and Tardós, J. D., “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Trans. Robot. 37(6), 1874–1890 (2021).10.1109/TRO.2021.3075644CrossRef Google Scholar

Rublee, E., Rabaud, V., Konolige, K. and Bradski, G., “ORB: An Efficient Alternative to SIFT or SURF,” In: 2011 International conference on computer vision 2011 Nov 6, Ieee (2011) pp.2564–2571.Google Scholar

Engel, J., Schöps, T. and Cremers, D., “LSD-SLAM: Large-Scale Direct Monocular SLAM,” In: European conference on computer vision 2014 Sep 6, Cham, Springer International Publishing (2014) pp. 834–849.Google Scholar

Forster, C., Pizzoli, M. and Scaramuzza, D., “SVO: Fast Semi-direct Monocular Visual Odometry,” In: 2014 IEEE international conference on robotics and automation (ICRA) 2014 May 31, IEEE (2014) pp. 15–22.Google Scholar

Newcombe, R. A., Lovegrove, S. J. and Davison, A. J., “DTAM: Dense Tracking and Mappping In Real-Time,” In: 2011 international conference on computer vision 2011 Nov 6, IEEE (2011) pp. 2320–2327.Google Scholar

Macario Barros, A., Michel, M., Moline, Y., Corre, G. and Carrel, F., “A comprehensive survey of visual slam algorithms,” Robotics 11(1), 24 (2022).10.3390/robotics11010024CrossRef Google Scholar

Saputra, M. R., Markham, A. and Trigoni, N., “Visual SLAM and structure from motion in dynamic environments: A survey,” ACM Computing Surveys (CSUR) 51(2), 1–36 (2018).10.1145/3177853CrossRef Google Scholar

Wu, W., Guo, L., Gao, H., You, Z., Liu, Y. and Chen, Z., “YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint,” Neural Comput. Appl. 34(8), 6011–6026 (2022).10.1007/s00521-021-06764-3CrossRef Google Scholar

Kan, X., Shi, G., Yang, X. and Hu, X., “YPR-SLAM: A SLAM system combining object detection and geometric constraints for dynamic scenes,” Sensors (Basel, Switzerland) 24(20), 6576 (2024). doi: 10.3390/s24206576.CrossRef Google Scholar PubMed

Wang, C. Y., Bochkovskiy, A. and Liao, H. Y., “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-art For Real-time Object Detectors,” In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023, IEEE (2023) pp. 7464–7475.Google Scholar

Bescos, B., Fácil, J. M., Civera, J. and DynaSLAM, N. J., “Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018).10.1109/LRA.2018.2860039CrossRef Google Scholar

Sturm, P., “Multi-View Geometry for General Camera Models,” In. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 2005 Jun 20, IEEE (2005) vol. 1, pp. 206–212.Google Scholar

Al-Tawil, B., Hempel, T., Abdelrahman, A. and Al-Hamadi, A., “A review of visual SLAM for robotics: Evolution, properties, and future applications,” Front. Robot. AI 11, 1347985 (2024). doi: 10.3389/frobt.2024.1347985.CrossRef Google Scholar PubMed

Kim, D. H. and Kim, J. H., “Effective background model-based RGB-D dense visual odometry in a dynamic environment,” IEEE Trans. Robot. 32(6), 1565–1573 (2016).10.1109/TRO.2016.2609395CrossRef Google Scholar

Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “Rgb-d slam in dynamic environments using point correlations,” IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 373–389 (2020).10.1109/TPAMI.2020.3010942CrossRef Google Scholar

Wang, R., Wan, W., Wang, Y. and Di, K., “A new RGB-D SLAM method with moving object detection for dynamic indoor scenes,” Remote Sens.-BASEL 11(10), 1143 (2019).10.3390/rs11101143CrossRef Google Scholar

Xu, G., Yu, Z., Xing, G., Zhang, X. and Pan, F., “Visual odometry algorithm based on geometric prior for dynamic environments,” Int. J. Adv. Manuf. Tech. 122(1), 235–242 (2022).10.1007/s00170-022-09219-8CrossRef Google Scholar

Li, F., Chen, W., Xu, W., Huang, L., Li, D., Cai, S., Yang, M., Xiong, X., Liu, Y. and Li, W., “A mobile robot visual SLAM system with enhanced semantics segmentation,” IEEE Access 29, 25442–25458 (2020).10.1109/ACCESS.2020.2970238CrossRef Google Scholar

Badrinarayanan, V., Kendall, A. and Cipolla, R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).10.1109/TPAMI.2016.2644615CrossRef Google Scholar PubMed

Vincent, J., Labbé, M., Lauzon, J. S., Grondin, F., Comtois-Rivet, P. M. and Michaud, F., “Dynamic Object Tracking and Masking for Visual SLAM,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020 Oct 24, IEEE (2020) pp. 4974–4979.Google Scholar

Bolya, D., Zhou, C., Xiao, F. and Lee, Y. J., “Yolact: Real-Time Instance Segmentation,” In: Proceedings of the IEEE/CVF international conference on computer vision 2019, IEEE (2019) pp. 9157–9166.Google Scholar

Labbé, M. and Michaud, F., “RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation,” J. Field. Robot. 36(2), 416–446 (2019).10.1002/rob.21831CrossRef Google Scholar

Guo, L., Lei, Y., Li, N., Yan, T. and Li, N., “Machinery health indicator construction based on convolutional neural networks considering trend burr,” Neurocomputing 31, 142–150 (2018).10.1016/j.neucom.2018.02.083CrossRef Google Scholar

Tan, Y., Guo, L., Gao, H. and Zhang, L., “Deep coupled joint distribution adaptation network: A method for intelligent fault diagnosis between artificial and real damages,” IEEE T. Instrum. Meas. 9, 1–2 (2020).Google Scholar

He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask r-cnn,” In: Proceedings of the IEEE international conference on computer vision 2017, IEEE (2017) pp. 2961–2969.Google Scholar

Yu, C., Liu, Z., Liu, X. J., Xie, F., Yang, Y., Wei, Q. and Fei, Q., “DS-SLAM: A Semantic Visual SLAM Towards Dynamic Environments,” In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) 2018 Oct 1, IEEE (2018) pp. 1168–1174.Google Scholar

Liu, Y. and Miura, J., “RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods,” IEEE Access 9, 23772–23785 (2021).10.1109/ACCESS.2021.3050617CrossRef Google Scholar

Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., Michael, K., Liu, C., Abhiram, V. and Skalski, T., “ultralytics/yolov5: v6.0 - YOLOv5n ’Nano’ models, Roboflow integration, TensorFlow export, OpenCV DNN support,” [Software], Zenodo, Oct 12, 2021. doi: 10.5281/zenodo.5563715.CrossRef Google Scholar

Huang, J., Xia, Q. and Kang, L., “TKG-SLAM: Dynamic SLAM Based on Target Tracking And Multi-view Geometry,” In: 2024 3rd International Conference on Innovations and Development of Information Technologies and Robotics (IDITR) 2024 May 23, IEEE (2024) pp. 105–111.Google Scholar

Zhong, M., Hong, C., Jia, Z., Wang, C. and Wang, Z., “DynaTM-SLAM: Fast filtering of dynamic feature points and object-based localization in dynamic indoor environments,” Robot. Auton. Syst. 174, 104634 (2024). doi: 10.1016/j.robot.2024.104634.CrossRef Google Scholar

Olorunshola, O. E., Irhebhude, M. E. and Evwiekpaefe, A. E., “A comparative study of YOLOv5 and YOLOv7 object detection algorithms,” J. Comput. Soc. Inform. 2(1), 1–2 (2023).10.33736/jcsi.5070.2023CrossRef Google Scholar

Winata, I. and Oh, J., “Lightweight extraction and segmentation with ghost convolutional and attention module integration for visual SLAM,” Int. J. Cont., Autom. Syst. 22(12), 3595–3605 (2024).10.1007/s12555-024-0529-5CrossRef Google Scholar

Woo, S., Park, J., Lee, J. Y. and Kweon, I. S., “Cbam: Convolutional Block Attention Module,” In: Proceedings of the European conference on computer vision (ECCV) 2018, Cham, Springer International Publishing (2018) pp. 3–19.Google Scholar

Liu, G., Zeng, W., Feng, B. and Xu, F., “DMS-SLAM: A general visual SLAM system for dynamic scenes with multiple sensors,” Sensors 19(17), 3714 (2019).10.3390/s19173714CrossRef Google Scholar PubMed

Xiao, L., Wang, J., Qiu, X., Rong, Z. and Dynamic-SLAM, Z. X., “Semantic monocular visual localization and mapping based on deep learning in dynamic environment,” Robot. Auton. Syst. 117, 1–16 (2019).10.1016/j.robot.2019.03.012CrossRef Google Scholar

Cheng, S., Sun, C., Zhang, S. and Zhang, D., “SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information,” IEEE Trans. Instrum. Meas. 72, 1–2 (2022).10.1109/TIM.2023.3326234CrossRef Google Scholar

Geiger, A., Lenz, P. and Urtasun, R., “Are We Ready For Autonomous Driving? The Kitti Vision Benchmark Suite,” In: 2012 IEEE conference on computer vision and pattern recognition 2012 Jun 16, IEEE (2012) pp. 3354–3361.Google Scholar

Sturm, J., Engelhard, N., Kerl, F., Cremers, D. and Burgard, W., “A Benchmark for the Evaluation of RGB-D SLAM Systems,” In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems 2012 Oct 7, IEEE (2012) pp. 573–580.Google Scholar

Grupp M, “evo: Python package for the evaluation of odometry and slam,” [Software], GitHub, 2017. Available at: https://github.com/MichaelGrupp/evo.Google Scholar

Article contents

A dynamic SLAM system with YOLOv7 segmentation and geometric constraints for indoor environments

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests