Repository logo
 

Multimodal approach for seafloor image classification using feature-level data fusion

dc.contributor.authorPandya, Kedar
dc.contributor.copyright-releaseNot Applicable
dc.contributor.degreeMaster of Computer Science
dc.contributor.departmentFaculty of Computer Science
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinern/a
dc.contributor.manuscriptsNot Applicable
dc.contributor.thesis-readerCraig Brown
dc.contributor.thesis-readerJanarthanan Rajendran
dc.contributor.thesis-supervisorThomas Trappenberg
dc.date.accessioned2025-04-23T11:00:11Z
dc.date.available2025-04-23T11:00:11Z
dc.date.defence2025-04-17
dc.date.issued2025-04-22
dc.description.abstractMapping seafloor substrates is crucial for understanding benthic habitats and monitoring their changes. Traditionally, this task involves manually annotating underwater images, an approach increasingly impractical due to growing data volumes. This thesis investigates a multimodal deep-learning approach for classifying seafloor substrates by integrating visual images with sonar data (backscatter and bathymetry) collected from the Bay of Fundy. To identify the role of wider spatial context, two sonar data sampling methods were compared: single-point sampling, which assigns a single sonar value based on image coordinates, and context-based sampling, which incorporates sonar data from adjacent measurements. Experiments demonstrated that context-based sampling, which provided a wider spatial context, significantly improved substrate classification accuracy. Feature-level data fusion strategies were then evaluated by encoding sonar data using multilayer perceptrons and extracting marginal representations from different layers of the sonar encoder. These representations were fused with visual features extracted via convolutional neural networks. The effective fusion strategy, established through these experiments, was subsequently implemented using a pre-trained Vision Transformer and ResNet50 models as image encoders. While ResNet50-based multimodal models showed only moderate improvements relative to a baseline trained solely on visual data, ViT-based fusion models achieved significantly larger gains, exceeding two standard deviations and improving accuracy by approximately 35% for challenging substrate types with substantial class overlap. Overall, this research demonstrates that multimodal learning frameworks can effectively leverage complementary features from sonar data sampled to provide a wider spatial context and visual data, resulting in substantial improvements in seafloor substrate classification accuracy. These findings establish a robust foundation for further research in multimodal oceanography and benthic mapping.
dc.identifier.urihttps://hdl.handle.net/10222/85051
dc.language.isoen
dc.subjectMachine Learning
dc.subjectDeep Learning
dc.subjectMultimodal Learning
dc.subjectSonar
dc.subjectImages
dc.subjectBenthic Substrates
dc.subjectOcean Floor Mapping
dc.titleMultimodal approach for seafloor image classification using feature-level data fusion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KedarPandya2025.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: