A Two-Stage Incremental Annotation Approach to Constructing A Network Informal Language Corpus

Yunqing Xia,Kam-Fai Wong,Robert Luk
2005-01-01
Abstract:Network Informal Language (NIL) refers to the special human language widely used in the commu- nity of digital network chat via platforms such as chat rooms/tools, mobile phone short message ser- vices (SMS), bulletin board systems (BBS), emails, etc. NIL holds anomalous characteristics in forming words, phrases, and non-alphabetical characters. This makes it difficult to handle NIL text by conven- tional natural language processing (NLP) tools. Previous research reveals that knowledge based methods perform less effectively in processing un- seen NIL expressions. This motivates us to construct an annotated NIL corpus which is used specially to develop and evaluate techniques for extraction and normalization of NIL expressions. A two-stage in- cremental annotation approach is proposed in this paper to construct a NIL corpus with minimal human involvement. Several experiments are conducted which reveal that the efficiency of corpus annotation can be improved greatly with this approach.
What problem does this paper attempt to address?