Fingerspelling within Sign Language Translation

Garrett Tanzer
2024-08-14
Abstract:Fingerspelling poses challenges for sign language processing due to its high-frequency motion and use for open-vocabulary terms. While prior work has studied fingerspelling recognition, there has been little attention to evaluating how well sign language translation models understand fingerspelling in the context of entire sentences -- and improving this capability. We manually annotate instances of fingerspelling within FLEURS-ASL and use them to evaluate the effect of two simple measures to improve fingerspelling recognition within American Sign Language to English translation: 1) use a model family (ByT5) with character- rather than subword-level tokenization, and 2) mix fingerspelling recognition data into the translation training mixture. We find that 1) substantially improves understanding of fingerspelling (and therefore translation quality overall), but the effect of 2) is mixed.
Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily focuses on the issue of fingerspelling in sign language translation, particularly in the translation from American Sign Language (ASL) to English. Fingerspelling is a method of spelling out words using a manual alphabet in sign language, which is especially important for words that do not have specific gestures, such as proper nouns or domain-specific vocabulary. The paper attempts to address the following key issues: 1. **Improving the understanding of fingerspelling**: Current sign language translation models perform poorly when handling fingerspelling within entire sentences, especially in understanding the contextual meaning of fingerspelling. 2. **Enhancing translation quality**: By better handling fingerspelling, the overall quality of translation can be improved. 3. **Evaluating the effectiveness of different methods**: The study explores two simple yet potentially effective methods to improve fingerspelling recognition: one is using a character-based tokenization method (instead of subword-based tokenization), and the other is mixing fingerspelling recognition data into the translation training data. Specifically, the authors conducted the following work: - Manually annotated all instances of fingerspelling in a dataset named FLEURS-ASL and used these annotations to evaluate the model's understanding of fingerspelling. - Conducted experiments using two different model architectures: one is the character-level tokenization model ByT5, and the other is the subword-level tokenization model T5. - Mixed a fingerspelling recognition dataset, FSboard, into the training set to see if this could further improve model performance. The experimental results show that the character-level tokenization model ByT5 significantly improves both fingerspelling recognition and overall translation quality. However, mixing additional fingerspelling recognition data into the training set did not bring noticeable extra benefits and sometimes even led to performance degradation. Therefore, the authors suggest that future research should consider adopting character-level tokenization methods as a standard practice or explore other ways to achieve similar benefits with fewer trade-offs.