Linear Model Incorporating Feature Ranking for Chinese Documents Readability

Gang Sun,Zhiwei Jiang,Qing Gu,Daoxu Chen
DOI: https://doi.org/10.1109/iscslp.2014.6936601
2014-01-01
Abstract:Assessing the readability of documents is always a rewarding work. In this paper, we apply linear regression models for readability assessment of Chinese documents, and put forward LiFR (Linear model incorporating Feature Ranking), which uses feature ranking to select the most appropriate text features to build the linear model. Text features specialized for Chinese are developed, which include the surface, part of speech, parse tree and entropy features. The experimental results demonstrate that both linear and log-linear regression models are worthy of confidence for readability assessment, and can achieve competitive performance to other machine learning methods, such as SVR (Support Vector Machine for Regression). Also the designed features are valuable, and feature ranking is essential to build useful linear functions.
What problem does this paper attempt to address?