Repository logo
 

Multimodal approach for seafloor image classification using feature-level data fusion

Date

2025-04-22

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Mapping seafloor substrates is crucial for understanding benthic habitats and monitoring their changes. Traditionally, this task involves manually annotating underwater images, an approach increasingly impractical due to growing data volumes. This thesis investigates a multimodal deep-learning approach for classifying seafloor substrates by integrating visual images with sonar data (backscatter and bathymetry) collected from the Bay of Fundy. To identify the role of wider spatial context, two sonar data sampling methods were compared: single-point sampling, which assigns a single sonar value based on image coordinates, and context-based sampling, which incorporates sonar data from adjacent measurements. Experiments demonstrated that context-based sampling, which provided a wider spatial context, significantly improved substrate classification accuracy. Feature-level data fusion strategies were then evaluated by encoding sonar data using multilayer perceptrons and extracting marginal representations from different layers of the sonar encoder. These representations were fused with visual features extracted via convolutional neural networks. The effective fusion strategy, established through these experiments, was subsequently implemented using a pre-trained Vision Transformer and ResNet50 models as image encoders. While ResNet50-based multimodal models showed only moderate improvements relative to a baseline trained solely on visual data, ViT-based fusion models achieved significantly larger gains, exceeding two standard deviations and improving accuracy by approximately 35% for challenging substrate types with substantial class overlap. Overall, this research demonstrates that multimodal learning frameworks can effectively leverage complementary features from sonar data sampled to provide a wider spatial context and visual data, resulting in substantial improvements in seafloor substrate classification accuracy. These findings establish a robust foundation for further research in multimodal oceanography and benthic mapping.

Description

Keywords

Machine Learning, Deep Learning, Multimodal Learning, Sonar, Images, Benthic Substrates, Ocean Floor Mapping

Citation