Text Chunking using Transformation-Based Learning

Lance A. Ramshaw,Mitchell P. Marcus
DOI: https://doi.org/10.48550/arXiv.cmp-lg/9505040
1995-05-24
Abstract:Eric Brill introduced transformation-based learning and showed that it can do part-of-speech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive ``baseNP'' chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 92% for baseNP chunks and 88% for somewhat more complex chunks that partition the sentence. Some interesting adaptations to the transformation-based learning approach are also suggested by this application.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve text chunking by using the Transformation - Based Learning (TBL) method. Specifically, the authors aim to develop an efficient method to identify non - recursive "baseNP" chunks in sentences and other types of text chunks, which can serve as the basis for deeper - level syntactic analysis. The paper mentions that traditional parsing methods have the problem of limited coverage when dealing with unrestricted texts, while the Transformation - Based Learning method can complete this part of the task with high accuracy, thus providing a relatively simple and effective solution. The paper details how to transform the text chunking problem into a tagging problem, that is, by attaching new labels to each word to encode the chunk structure to which it belongs. The advantage of this method is that it avoids many difficulties caused by unbalanced parentheses, allowing local rules to directly modify the labels on words instead of inserting or changing the parentheses between words. In addition, the paper also explores some specific challenges encountered in the Transformation - Based Learning process, such as how to effectively organize calculations, how to index static rule elements, and how to accelerate calculations by heuristically disabling unlikely rules. Through these optimization measures, the authors were able to conduct experiments on larger training sets and achieved significant results. Overall, this research aims to provide a more solid foundation for other advanced tasks in natural language processing, such as verb - argument identification and index term generation, by improving text chunking techniques.