Deep Learning based Segmentation of Fish in Noisy Forward Looking MBES Images

Jesper Haahr Christensen,Lars Valdemar Mogensen,Ole Ravn
DOI: https://doi.org/10.48550/arXiv.2006.09034
2020-06-16
Abstract:In this work, we investigate a Deep Learning (DL) approach to fish segmentation in a small dataset of noisy low-resolution images generated by a forward-looking multibeam echosounder (MBES). We build on recent advances in DL and Convolutional Neural Networks (CNNs) for semantic segmentation and demonstrate an end-to-end approach for a fish/non-fish probability prediction for all range-azimuth positions projected by an imaging sonar. We use self-collected datasets from the Danish Sound and the Faroe Islands to train and test our model and present techniques to obtain satisfying performance and generalization even with a low-volume dataset. We show that our model proves the desired performance and has learned to harness the importance of semantic context and take this into account to separate noise and non-targets from real targets. Furthermore, we present techniques to deploy models on low-cost embedded platforms to obtain higher performance fit for edge environments - where compute and power are restricted by size/cost - for testing and prototyping.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in low - resolution, noisy forward - looking multi - beam echo sounder (MBES) images, how to use deep - learning techniques for fish segmentation and recognition. Specifically, the authors hope to develop a method that can accurately distinguish fish from other non - target objects (such as noise, surface reflections, etc.), and can be efficiently deployed on an embedded platform to adapt to the resource - constrained edge - computing environment. ### Problem Background In the field of underwater imaging, optical sensors are usually not suitable for underwater monitoring tasks due to the influence of water turbidity. In contrast, multi - beam echo sounders (MBES) have become the preferred devices because of their robustness to underwater conditions and long detection range. However, the images generated by MBES often have problems such as high noise and little detail information, which makes visual processing tasks based on these images very challenging. ### Main Contributions of the Paper 1. **Proposed an end - to - end fish - segmentation model based on deep learning**: - This model can predict the probability that each pixel in the input MBES image belongs to fish or non - fish. - The model architecture adopts a convolutional encoder - decoder structure and uses skip connections to recover fine - grained spatial information. 2. **Solved the small - data - set training problem**: - Due to the limited amount of available labeled data, the authors used data - augmentation techniques and transfer - learning methods to improve the generalization ability of the model. 3. **Demonstrated the deployment of the model on an embedded platform**: - In order to adapt to the resource limitations in the edge - computing environment, the authors optimized the model and tested it on low - cost embedded devices (such as Raspberry Pi and NVIDIA Jetson Nano), proving that the model can run efficiently on these platforms. ### Formula Presentation The loss function used in this paper is the binary cross - entropy loss, and the formula is as follows: \[ L_{BCE}(M)=-\sum_{i = 1}^{C}y_i\log[M(x_i)]+(1 - y_i)\log[1 - M(x_i)] \] where: - \(M\) is the model, - \(C\) is the number of classes (in this case, 2: fish and non - fish), - \(i\) represents each class, - \(y\) is the true label value, - \(x\) is the input data. The goal is to minimize this loss function: \[ \min_M L_{BCE}(M) \] ### Summary In general, this paper aims to solve the key problem in underwater fish recognition, that is, how to accurately segment fish in noisy MBES images and ensure that the model can be efficiently deployed and run in a resource - constrained environment.