SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification

Lukáš Adam,Vojtěch Čermák,Kostas Papafitsoros,Lukáš Picek
2024-05-01
Abstract:This paper introduces the first public large-scale, long-span dataset with sea turtle photographs captured in the wild -- \href{
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key issues in sea turtle individual re-identification: 1. **Time Span and Ecological Authenticity of the Dataset**: - Existing animal re-identification datasets usually have a short time span and lack ecological authenticity. These datasets are often collected in controlled environments or have limited time spans, leading to biases in method evaluation. - This paper introduces a new large-scale, long-term dataset—SeaTurtleID2022, which includes photos of sea turtles taken in the wild over a span of 13 years, with a total of 8729 photos involving 438 unique individuals. 2. **Dataset Splitting Method**: - Traditional dataset splitting methods are usually random, which may lead to data leakage between the training and testing sets, thus overestimating model performance. - This paper proposes two time-based splitting methods: time-aware closed-set and time-aware open-set. These splitting methods better simulate real-world scenarios, avoid data leakage, and improve the accuracy of model evaluation. 3. **Baseline Performance of Sea Turtle Individual Re-identification**: - The paper provides baseline performance evaluations for instance segmentation and re-identification based on different body parts, including the head, flippers, and the whole body. - Through experiments, the paper validates the performance of different feature extraction methods (such as SIFT, Superpoint, ArcFace, and Triplet Loss) in the sea turtle re-identification task, demonstrating the advantages of deep learning methods over traditional methods. 4. **Importance of Time-aware Splitting**: - By comparing the results of random splitting and time-aware splitting, the paper demonstrates that time-aware splitting can significantly reduce the problem of performance overestimation. For example, under time-aware splitting, the model's accuracy in the head recognition task is only 69.2%, whereas it is as high as 87.2% under random splitting. ### Summary The main contribution of this paper is the provision of a large-scale, long-term sea turtle re-identification dataset—SeaTurtleID2022, which has unique characteristics and various annotations. The paper also proposes time-based splitting methods to improve the accuracy of model evaluation. Through baseline performance evaluations and experimental validation, the paper demonstrates the superiority of deep learning methods in the sea turtle re-identification task and emphasizes the importance of time-aware splitting in avoiding performance overestimation.