Hostname: page-component-76c49bb84f-6sxsx Total loading time: 0 Render date: 2025-07-05T13:49:14.821Z Has data issue: false hasContentIssue false

MVFD-Net: multi-view fusion detection network for occluded underwater dam cracks

Published online by Cambridge University Press:  24 June 2025

Yukai Wu
Affiliation:
School of Mechanical and Electrical Engineering, Henan University of Technology, ZHengzhou, PR China
Xiaochen Qin
Affiliation:
School of Mechanical and Electrical Engineering, Henan University of Technology, ZHengzhou, PR China
Lei Cai*
Affiliation:
School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang, PR China
*
Corresponding author: Lei Cai; Email: cailei2014@126.com

Abstract

Detecting cracks in underwater dams is crucial for ensuring the quality and safety of the dam. However, underwater dam cracks are easily obscured by aquatic plants. Traditional single-view visual inspection methods cannot effectively extract the feature information of the occluded cracks, while multi-view crack images can extract the occluded target features through feature fusion. At the same time, underwater turbulence leads to nonuniform diffusion of suspended sediments, resulting in nonuniform flooding of image feature noise from multiple viewpoints affecting the fusion effect. To address these issues, this paper proposes a multi-view fusion network (MVFD-Net) for crack detection in occluded underwater dams. First, we propose a feature reconstruction interaction encoder (FRI-Encoder), which interacts the multi-scale local features extracted by the convolutional neural network with the global features extracted by the transformer encoder and performs the feature reconstruction at the end of the encoder to enhance the feature extraction capability and at the same time in order to suppress the interference of the nonuniform scattering noise. Subsequently, a multi-scale gated adaptive fusion module is introduced between the encoder and the decoder for feature gated fusion, which further complements and recovers the noise flooding detail information. Additionally, this paper designs a multi-view feature fusion module to fuse multi-view image features to restore the occluded crack features and achieve the detection of occluded cracks. Through extensive experimental evaluations, the MVFD-Net algorithm achieves excellent performance when compared with current mainstream algorithms.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Mucolli, L., Krupinski, S., Maurelli, F., Mehdi, S. A. and Mazhar, S., “Detecting cracks in underwater concrete structures: An unsupervised learning approach based on local feature clustering,” MTS/IEEE Seattle 23(1), 18 (2019).Google Scholar
Chen, D., Huang, B. and Kang, F., “A review of detection technologies for underwater cracks on concrete dam surfaces,” Appl. Sci. 13(6), 35643578 (2023).10.3390/app13063564CrossRefGoogle Scholar
Jian, M., Yang, N., Tao, C., Zhi, H. and Luo, H., “Underwater object detection and datasets: A survey,” Intelligent Marine Technology and Systems 2(1), 922 (2024).10.1007/s44295-024-00023-6CrossRefGoogle Scholar
Wu, Y., Li, S., Zhang, J., Li, Y., Li, Y. and Zhang, Y., “Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement,” Autom. Constr. 157(3), 105166105178 (2024).10.1016/j.autcon.2023.105166CrossRefGoogle Scholar
Beyene, D. A., Tran, D. Q., Maru, M. B., Kim, T. and Park, S., “Unsupervised domain adaptation-based crack segmentation using transformer network,” J. Build. Eng. 15(2), 107889107990 (2023).10.1016/j.jobe.2023.107889CrossRefGoogle Scholar
Zhang, X., Bai, C. and Kpalma, K., “OMCBIR: Offline mobile content-based image retrieval with lightweight CNN optimization,” Displays 76(6), 102355102368 (2021).10.1016/j.displa.2022.102355CrossRefGoogle Scholar
Yang, Z., Zhu, L., Wu, Y. and Yang, Y., “Gated Channel Transformation for Visual Recognition,” 33st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Seattle, WA, USA, 2020), pp. 1179111800.Google Scholar
Wang, J., Zeng, Z., Sharma, P. K., Alfarraj, O., Tolba, A., Zhang, J. and Wang, L., “Dual-path network combining CNN and transformer for pavement crack segmentation,” Autom. Constr. 158(64), 105217105239 (2024).10.1016/j.autcon.2023.105217CrossRefGoogle Scholar
Zhu, Z., Huang, S., Xie, J., Meng, Y., Wang, C. and Zhou, F., “A refined robotic grasp detection network based on coarse-to-fine feature and residual attention,” Robotica. 43(2), 118 (2024).Google Scholar
Ke, L., Tai, Y. W. and Tang, C. K., “Deep Occlusion-Aware Instance Segmentation with Overlapping Bilayers,” 34st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Nashville, TN, USA, 2021), pp. 40184027 Google Scholar
Yuan, X., Kortylewski, A., Sun, Y. and Yuille, A., “Robust Instance Segmentation Through Reasoning About Multi-Object Occlusion,” 34st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Nashville, TN, USA, 2021), pp. 1114111150.Google Scholar
Zhang, T., Dai, J., Song, W., Zhao, R. and Zhang, B., “OSLPNet: A neural network model for street lamp post extraction from street view imagery,” Expert Syst. Appl. 231(25), 120764120782 (2023).10.1016/j.eswa.2023.120764CrossRefGoogle Scholar
Gan, H., Menegon, F., Sun, A., Scollo, A., Jiang, Q., Xue, Y. and Norton, T., “Peeking into the unseen: Occlusion-resistant segmentation for preweaning piglets under crushing events,” Comput. Electron. Agric. 219(23), 108683108693 (2024).10.1016/j.compag.2024.108683CrossRefGoogle Scholar
Wang, H., Zhu, S., Chen, L., Li, Y. and Cai, Y., “OccludedInst: An efficient instance segmentation network for automatic driving occlusion scenes,” IEEE Trans. Emerg. Top. Comput. Intell. 10(12), 34149483414962 (2024).Google Scholar
Yan, X., Wang, F., Liu, W., Yu, Y., He, S. and Pan, J., “Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery,” 32st IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE, Seoul, Korea, 2019), pp. 76177626.Google Scholar
Guo, Y., Yu, H., Xie, S., Ma, L., Cao, X. and Luo, X., “DSCA: A dual semantic correlation alignment method for domain adaptation object detection,” Pattern Recognit 150(12), 110329110345 (2024).10.1016/j.patcog.2024.110329CrossRefGoogle Scholar
Dong, N., Yan, S., Tang, H., Tang, J. and Zhang, L., “Multi-view information integration and propagation for occluded person re-identification,” Inf. Fusion 104(22), 102201102221 (2024).10.1016/j.inffus.2023.102201CrossRefGoogle Scholar
Xia, Z., Liao, M., Di, S., Zhao, Y., Liang, W. and Xiong, N., “Automatic liver segmentation from CT volumes based on multiview information fusion and condition random fields,” Opt. Laser Technol. 179(65), 111298111320 (2024).10.1016/j.optlastec.2024.111298CrossRefGoogle Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. and Bengio, Y., “Generative Adversarial Nets,” 27th International Conference on Neural Information Processing Systems-Volume 2 (NIPS’14) (NIPS, Cambridge, MA, USA, 2014), pp. 26722680.Google Scholar
Isola, P., Zhu, J. Y., Zhou, T. and Efros, A., “Image-to-Image Translation with Conditional Adversarial Networks,” 30st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Honolulu, HI, USA, 2017), pp. 59675976.Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J. and Efros, A., “View Synthesis by Appearance Flow,” Computer Vision–ECCV 2016 (Springer, Amsterdam, The Netherlands, 2016), pp. 286301.10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
Yang, C., Wang, K., Wang, Y., Dou, Q., Yang, X. and Shen, W., “Efficient deformable tissue reconstruction via orthogonal neural plane,” IEEE Trans. Med. Imaging 43(11), 32113223 (2024).10.1109/TMI.2024.3388559CrossRefGoogle ScholarPubMed
Choy, C. B., Xu, D., Gwak, J., Chen, K. and Savarese, S., “3D-R2N2: A Unified Approach for Single and Multi-View 3D Object Reconstruction,” Computer Vision–ECCV 2016 (Springer, Amsterdam, The Netherlands, 2016), pp. 628644.10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
Yang, J., Zheng, W. S., Yang, Q., Chen, Y. C. and Tian, Q., “Spatial-temporal graph convolutional network for video-based person re-identification,” 33st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, Seattle, WA, USA, 2022), pp. 32893299.Google Scholar
Zhang, X., Zheng, Z., Gao, D., Zhang, B., Yang, Y. and Chua, T., “Multi-view consistent generative adversarial networks for compositional 3D-aware image synthesis,” Int. J. Comput. Vis. 131(26), 22192242 (2023).10.1007/s11263-023-01805-xCrossRefGoogle Scholar
Arooj, S., Altaf, S., Ahmad, S., Mahmoud, H. and Mohamed, A. S. N., “Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language,” J. King Saud Univ.-Comput. Inf. Sci. 36(23), 101934101962 (2024).10.1016/j.jksuci.2024.101934CrossRefGoogle Scholar
Xu, H., Yuan, J. and Ma, J., “MURF: Mutually reinforcing multi-modal image registration and fusion,” IEEE Trans. Pattern Anal. Mach. Intell. 45(23), 1214812166 (2023).10.1109/TPAMI.2023.3283682CrossRefGoogle ScholarPubMed
Hua, Y., Huang, X., Li, H. and Cao, X., “Mobile robot tracking control based on lightweight network,” Robotica, 42(2), 119 (2025).Google Scholar
Pan, J., Jia, J. and Cai, L., “Global enhancement network underwater archaeology scene parsing method,” Robotica 39(12), 35413564 (2023).10.1017/S026357472300098XCrossRefGoogle Scholar
Diao, S., Su, J., Yang, C., Zhu, W., Xiang, D., Chen, X. and Shi, F., “Classification and segmentation of OCT images for age-related macular degeneration based on dual guidance networks,” Biomed. Signal Process. Control 84(35), 104810104830 (2023).10.1016/j.bspc.2023.104810CrossRefGoogle Scholar
Liu, Y., Wang, H., Chen, Z., Huangliang, K. and Zhang, H., “TransUNet+: Redesigning the skip connection to enhance features in medical image segmentation,” Knowl.-Based Syst. 256(23), 109872109889 (2022).10.1016/j.knosys.2022.109859CrossRefGoogle Scholar
Liu, H., Yang, J., Miao, X., Mertz, C. and Kong, H., “Crackformer network for pavement crack segmentation,” IEEE Trans. Intell. Transp. Syst. 24(14), 92409252 (2023).10.1109/TITS.2023.3266776CrossRefGoogle Scholar
Li, K., Yang, J., Ma, S., Wang, B. and Wang, S., “Rethinking lightweight convolutional neural networks for efficient and high-quality pavement crack detection,” IEEE Trans. Intell. Transp. Syst. 23(16), 237250 (2024).10.1109/TITS.2023.3307286CrossRefGoogle Scholar
Badrinarayanan, V., Kendall, A. and Cipolla, R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 24812495 (2017).10.1109/TPAMI.2016.2644615CrossRefGoogle ScholarPubMed
He, Z., Cao, L., Luo, J., Xu, X., Tang, J., Xu, J. and Chen, Z., “UISS-Net: Underwater image semantic segmentation network for improving boundary segmentation accuracy of underwater images,” Aquac. Int. 32(12), 56255638 (2024).10.1007/s10499-024-01439-xCrossRefGoogle Scholar