Abstract:The natural language processing (NLP) of sign language aims to make human sign language “understandable” to computers. In achieving this goal, the text of sign language should first be segmented into sign sequences for computers to recognize. This segmentation process constitutes the basis for the information processing of sign language. With an aim to solve the problems in expressing Chinese sign language (CSL), this paper analyzes the lexical features of CSL and discusses various sign segmentation algorithms used in obtaining computer-read files. Sign segmentation involves two main approaches: The first is rule based, whereas the second is statistics based. Backward maximum matching (BMM) is an important rule-based method widely used in Chinese NLP fields. The recently proposed conditional random fields (CRFs) have also demonstrated excellent performance as a statistical method in international tests. In this study, both the BMM and CRFs methods are employed on the same dataset to explore the practical issues in the sign segmentation of CSL. The results of the CRFs method are then presented and discussed. Our corpus contains only hundreds of sentences; therefore, cross-validation based on CRFs is also performed to avoid the unreliable function that may arise from using an exceedingly small corpus scale within limited processing time. Specifically, three-group twofold cross-validation is applied to analyze the design of the annotation specification and the selection of a feature template. The results validate the effectiveness of our proposed segmentation strategy and confirm that CRFs outperform the BMM method. The proposed approach yields an F-score of 77.4% in sign segmentation in the CSL corpus. The CRFs perform effectively in sign segmentation because they can capture the arbitrary, overlapping features of the input in a Markov model. However, to obtain more satisfactory results, we must rely on the technological development of the sign language corpus.

Vietnamese Word Segmentation With.Crfs And Svms: An Investigation

Word Segmentation of Vietnamese Texts: a Comparison of Approaches

State-of-the-Art Vietnamese Word Segmentation

CRF with Locality-Consistent Dictionary Learning for Semantic Segmentation

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Is word segmentation necessary for Vietnamese sentiment classification?

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing

Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

CRFs Based Chinese Word Segmentation

A Comparative Study on Chinese Word Segmentation Using Statistical Models

A Hybrid Approach to Chinese Word Segmentation around CRFs

Building a Semantic Role Labelling System for Vietnamese

Vietnamese Semantic Role Labelling

Chinese Word Segmentation Via BiLSTM+Semi-CRF with Relay Node

A morphology-based Chinese word segmentation method

Finding Better Subword Segmentation for Neural Machine Translation

Vietnamese Text Classification Algorithm using Long Short Term Memory and Word2Vec

Word Segmentation for Asian Languages: Chinese, Korean, and Japanese

A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese

Study of Sign Segmentation in the Text of Chinese Sign Language