Discover Linguistic Patterns in Parsed Corpus with Frequent Subrtree Mining

Bo Wang,Tiejun Zhao,Muyun Yang,Sheng Li
DOI: https://doi.org/10.1109/WKDD.2010.9
2010-01-01
Abstract:Recognition of special linguistic patterns in a certain language is very helpful for many NLP applications such as information extraction, machine translation and parsing. State-of-the-arts syntax parsers are based on given grammar. The used grammar is context free and cannot discover complex patterns which contain multiple linguistic units. We propose an unsupervised method to automatically discover the complex linguistic patterns from a classically parsed corpus. A specialized and efficient algorithm is applied to mine the frequent subtrees in the forest and the found subtrees are formalized as the linguistic patterns. The approach is validated on the Penn Chinese Treebank with found linguistic patterns.
What problem does this paper attempt to address?