Abstract:The water transportation environment poses challenges in terms of complexity, making it difficult to achieve clear and diverse visual target detection in water surface images captured by conventional optical cameras. This difficulty is particularly prominent when detecting medium- and small-scale objects in visible light visual signals. For the development of smart maritime applications, we proposed a multi-scale ship object detection (MS-SOD) algorithm to improve the performance of multi-scale ship object detection in complex waters. MS-SOD is built based on the mainstream framework of one-stage object detection models. The convolutional block attention module is embedded into its backbone network to optimize the ability of ship feature extraction. The shallow features with rich detailed information are added to the multi-scale feature fusion network, and cross-stage-partial residual structure is used to enhance the fusion mechanism of multi-scale ship object features. Additionally, a focal loss function is employed to optimize the training process of the model, and an adaptive anchor clustering algorithm is designed to optimize the prior anchor and improve the detection capability for multi-scale ship objects. Extensive experiments are conducted on a self-built large-scale ship object dataset to validate the effectiveness and efficiency of the proposed MS-SOD algorithm. Experimental results show that the accuracy of MS-SOD outperforms various mainstream comparative methods on test dataset. Especially, compared with the YOLOv4 algorithm, the detection accuracy of large-, medium-, and small-scale ship objects improve by 11.3%, 6.0%, and 10.5%, respectively.