What is it about?

Underwater image recognition is considered an important yet challenging task for marine exploration. The acoustic image provides the visual information of the underwater environment. However, the complicated environment leads to severe interference with the image quality, which enhances the difficulty for the recognition task. We propose Cross-Modal Augmentation via Fusion (CMAF), a method that generates additional information by incorporating a different kind of data related to communication signals. Additionally, we ensure that the system understands how different types of information are linked, maximizing the information captured from the model. The experiments revealed that adding this extra communication signal significantly improves the computer's ability to recognize objects in underwater images. The fusion approach of CMAF intelligently combines the underwater image with the communication signal. This innovative resolution with CMAF holds great promise for advancing the field of underwater image recognition.

Featured Image

Why is it important?

This work exploits a novel framework for underwater image recognition, providing researchers with diverse solutions for multi-modal data fusion, loss functions, and training strategies. The proposed methods also tackle challenges associated with class imbalance in the dataset.


Writing this article was a great pleasure, marking a milestone in my master thesis. I hope this work inspires researchers in the fields of underwater exploration and artificial intelligence.

Shih-Wei Yang
National Chiao Tung University

Read the Original

This page is a summary of: CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition, ACM Transactions on Multimedia Computing Communications and Applications, December 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3636427.
You can read the full text:



The following have contributed to this page