Study on Japanese Word Segmentation and POS Tagging Based on Rules and Statistics

JIANG Shangpu,CHEN Qunxiu
DOI: https://doi.org/10.3969/j.issn.1003-0077.2010.01.021
2010-01-01
Abstract:Word segmentation and part-of-speech tagging is the first step of Japanese natural language processing tasks,such as machine translation in which Japanese is the source language.In this paper,a Japanese word segmentation and POS tagging approach based on rules and statistics is proposed.Adopting a single perceptron based joint word segmentation and POS tagging algorithm as the basic framework,this method is combined with the features of adjacency attributes which are derived by heuristic rules.The experiment on a small test dataset shows that the new approach achieves an F-score of 98.2% on word segmentation,and 94.8% on both word segmentation and POS tagging.This work has already been applied into the Japanese-Chinese machine translation system successfully.
What problem does this paper attempt to address?