Aerial Flood Scene Classification Using Fine-Tuned Attention-based Architecture for Flood-Prone Countries in South Asia

Ibne Hassan,Aman Mujahid,Abdullah Al Hasib,Andalib Rahman Shagoto,Joyanta Jyoti Mondal,Meem Arafat Manab,Jannatun Noor
2024-11-01
Abstract:Countries in South Asia experience many catastrophic flooding events regularly. Through image classification, it is possible to expedite search and rescue initiatives by classifying flood zones, including houses and humans. We create a new dataset collecting aerial imagery of flooding events across South Asian countries. For the classification, we propose a fine-tuned Compact Convolutional Transformer (CCT) based approach and some other cutting-edge transformer-based and Convolutional Neural Network-based architectures (CNN). We also implement the YOLOv8 object detection model and detect houses and humans within the imagery of our proposed dataset, and then compare the performance with our classification-based approach. Since the countries in South Asia have similar topography, housing structure, the color of flood water, and vegetation, this work can be more applicable to such a region as opposed to the rest of the world. The images are divided evenly into four classes: 'flood', 'flood with domicile', 'flood with humans', and 'no flood'. After experimenting with our proposed dataset on our fine-tuned CCT model, which has a comparatively lower number of weight parameters than many other transformer-based architectures designed for computer vision, it exhibits an accuracy and macro average precision of 98.62% and 98.50%. The other transformer-based architectures that we implement are the Vision Transformer (ViT), Swin Transformer, and External Attention Transformer (EANet), which give an accuracy of 88.66%, 84.74%, and 66.56% respectively. We also implement DCECNN (Deep Custom Ensembled Convolutional Neural Network), which is a custom ensemble model that we create by combining MobileNet, InceptionV3, and EfficientNetB0, and we obtain an accuracy of 98.78%. The architectures we implement are fine-tuned to achieve optimal performance on our dataset.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of low - efficiency search and rescue (SAR) faced by South Asian countries during flood disasters. Specifically, the authors propose an architecture based on the fine - tuned attention mechanism, which classifies flood scenes using aerial images taken by drones to accelerate rescue operations. #### Main problem background 1. **Frequent and severe flood disasters**: - South Asian countries such as Bangladesh, India and Pakistan are often hit by floods. These floods not only damage houses and infrastructure but also cause a large number of casualties and displacements. - For example, the flood in the northeast of Bangladesh in June 2024 affected about 1.8 million people and many houses were flooded; in 2022, the flood in Pakistan flooded one - third of the country's territory and affected 33 million people. 2. **Limitations of traditional rescue methods**: - When a flood occurs, government agencies and other aid organizations usually rely on boats and planes for physical search. This method is time - consuming and reduces the rescue efficiency. - From the ground perspective, houses and landmarks are covered by floods, making it difficult to quickly locate survivors. #### Solutions To solve the above problems, the authors propose the following methods: 1. **Construction of a new data set**: - Collect and construct a new aerial image data set covering flood events in South Asian countries, which is divided into four categories: 'flood', 'flood with residences', 'flood with people' and 'no flood'. 2. **Fine - tuned Compact Convolutional Transformer (CCT) and other models**: - Use the fine - tuned Compact Convolutional Transformer (CCT) and other cutting - edge Transformer and Convolutional Neural Network (CNN) architectures for classification experiments. - The experimental results show that the CCT model exhibits a high accuracy rate (98.62%) and macro - average precision (98.50%) on this data set. 3. **Application of the target detection model**: - Implement the YOLOv8 target detection model to detect houses and humans in images and compare the results with the classification method. 4. **Cross - data set verification**: - Apply the same model to another public flood image data set FloodNet to verify its generalization ability. #### Expected effects By introducing image classification technology, especially using aerial images obtained by drones, the location of flood areas, houses and people can be identified and mapped more quickly and accurately, thereby improving the efficiency of rescue operations and reducing casualties and property losses caused by floods. In conclusion, this paper is committed to enhancing the flood disaster response capacity in South Asia by combining advanced deep - learning technologies and practical application requirements, especially providing more efficient support in search and rescue.