MMD-Net: A Vision-Language Perception Model for Maritime Object Detection and Application

SHI Bo; WANG Zitao; CAO Tianyu; ZHAO Hong; GE Qiqi

doi:10.1007/s11802-026-6153-7

SHI Bo, WANG Zitao, CAO Tianyu, ZHAO Hong, GE Qiqi. MMD-Net: A Vision-Language Perception Model for Maritime Object Detection and ApplicationJ. Journal of Ocean University of China, 2026, 25(2): 505-518. DOI: 10.1007/s11802-026-6153-7

Citation:

MMD-Net: A Vision-Language Perception Model for Maritime Object Detection and Application

Abstract

Abstract

Deep neural network (DNN)-based detectors assist humans in object detection, particularly in maritime transportation, contributing to the autonomy of marine vehicles and systems. In this paper, a multimodal detector network (MMD-Net) incorporating natural language as an additional supervisory signal is proposed to improve the performance and generalization of maritime detectors. Motivated by the limitations of traditional neural networks in maritime object detection, the distribution focal scaling is introduced for object classification, and an algorithm that dynamically adjusts the learning rate to accelerate convergence is proposed, aiming to mitigate the significant class imbalance present in maritime datasets. Furthermore, the performance of the region proposal network is improved through our layer-wise training strategy to better capture the diverse sizes and appearances of sea objects. Additionally, different from traditional maritime object detectors, which are confined to predicting a fixed set of predefined object categories, our detector can infer novel categories with rich prior knowledge gained from region-text pre-training. Validation experiments demonstrated that our proposed method achieves a mean average precision (mAP) of 75.8% on the Singapore Maritime Dataset, surpassing other state-of-the-art DNNs. Impressively, even without extra training on novel maritime classes and scenes, the detection accuracy of our model reached 21.1%.

FullText(HTML)

References (40)

Cited By

Turn off MathJax

Article Contents

MMD-Net: A Vision-Language Perception Model for Maritime Object Detection and Application

Abstract

Catalog

Export File

Citation

Format

Content