Multimodal approach for seafloor image classification using feature-level data fusion

Pandya, Kedar

Multimodal approach for seafloor image classification using feature-level data fusion

dc.contributor.author	Pandya, Kedar
dc.contributor.copyright-release	Not Applicable
dc.contributor.degree	Master of Computer Science
dc.contributor.department	Faculty of Computer Science
dc.contributor.ethics-approval	Not Applicable
dc.contributor.external-examiner	n/a
dc.contributor.manuscripts	Not Applicable
dc.contributor.thesis-reader	Craig Brown
dc.contributor.thesis-reader	Janarthanan Rajendran
dc.contributor.thesis-supervisor	Thomas Trappenberg
dc.date.accessioned	2025-04-23T11:00:11Z
dc.date.available	2025-04-23T11:00:11Z
dc.date.defence	2025-04-17
dc.date.issued	2025-04-22
dc.description.abstract	Mapping seafloor substrates is crucial for understanding benthic habitats and monitoring their changes. Traditionally, this task involves manually annotating underwater images, an approach increasingly impractical due to growing data volumes. This thesis investigates a multimodal deep-learning approach for classifying seafloor substrates by integrating visual images with sonar data (backscatter and bathymetry) collected from the Bay of Fundy. To identify the role of wider spatial context, two sonar data sampling methods were compared: single-point sampling, which assigns a single sonar value based on image coordinates, and context-based sampling, which incorporates sonar data from adjacent measurements. Experiments demonstrated that context-based sampling, which provided a wider spatial context, significantly improved substrate classification accuracy. Feature-level data fusion strategies were then evaluated by encoding sonar data using multilayer perceptrons and extracting marginal representations from different layers of the sonar encoder. These representations were fused with visual features extracted via convolutional neural networks. The effective fusion strategy, established through these experiments, was subsequently implemented using a pre-trained Vision Transformer and ResNet50 models as image encoders. While ResNet50-based multimodal models showed only moderate improvements relative to a baseline trained solely on visual data, ViT-based fusion models achieved significantly larger gains, exceeding two standard deviations and improving accuracy by approximately 35% for challenging substrate types with substantial class overlap. Overall, this research demonstrates that multimodal learning frameworks can effectively leverage complementary features from sonar data sampled to provide a wider spatial context and visual data, resulting in substantial improvements in seafloor substrate classification accuracy. These findings establish a robust foundation for further research in multimodal oceanography and benthic mapping.
dc.identifier.uri	https://hdl.handle.net/10222/85051
dc.language.iso	en
dc.subject	Machine Learning
dc.subject	Deep Learning
dc.subject	Multimodal Learning
dc.subject	Sonar
dc.subject	Images
dc.subject	Benthic Substrates
dc.subject	Ocean Floor Mapping
dc.title	Multimodal approach for seafloor image classification using feature-level data fusion

Files

Original bundle

Now showing 1 - 1 of 1

Name:: KedarPandya2025.pdf
Size:: 2.27 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses