A study on cross-language text summarization using supervised methods

Lei Yu,Fuji Ren
DOI: https://doi.org/10.1109/NLPKE.2009.5313809
2009-01-01
Abstract:In this work, we use Hidden Markov Models (HMM), Conditional Random Field (CRF), Gaussian Mixture Models (GMM) and Mathematical Methods of Statistics (MMS) for Chinese and Japanese text summarization. The purpose of this work is to study the applicability of mentioned three trainable models for cross-language text summarization. For model training, we use several training features such as sentence position, sentence centrality, number of Name Entity and so on. For model testing, Chinese on-line news and Japanese news are used as test data which are extracted from web pages. We evaluate each model by measuring the precision at the compression rate 10%, 20% and 30%. MMS is a baseline method. The results show that HMM, CRF and GMM have remarkable increases than MMS on both Chinese and Japanese text summarization by using the same training features. Especially, GMM model make a best performance in all tests.
What problem does this paper attempt to address?