Automatic Filtration of Multiword Units

Ying Liu,Zheng Tie
DOI: https://doi.org/10.1109/nlpke.2010.5587783
2010-01-01
Abstract:This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
What problem does this paper attempt to address?