Unsupervised Readability Assessment Via Learning from Weak Readability Signals

Yuliang Liu,Zhiwei Jiang,Yafeng Yin,Cong Wang,Sheng Chen,Zhaoling Chen,Qing Gu
DOI: https://doi.org/10.1145/3539618.3591695
2023-01-01
Abstract:Unsupervised readability assessment aims to evaluate the reading difficulty of text without any manually-labeled data for model training. This is a challenging task because the absence of labeled data makes it difficult for the model to understand what readability is. In this paper, we propose a novel framework to Learn a neural model from Weak Readability Signals (LWRS). Instead of relying on labeled data, LWRS utilizes a set of heuristic signals that specialize in describing text readability from different aspects to guide the model in outputting readability scores for ranking. Specifically, to effectively use multiple heuristic weak signals for model training, we build a multi-signal learning model that ranks the unlabeled texts from multiple readability-related aspects based on intra- and inter-signal learning. We also adopt the pairwise ranking paradigm to reduce the cascade coupling among partial-order pairs. Furthermore, we propose identifying the most representative signal based on the batch-level consensus distribution of all signals. This strategy helps identify the predicted signal that is most correlated with readability in the absence of ground-truth labels. We conduct experiments on three public readability assessment datasets. The experimental results demonstrate that our LWRS outperforms each heuristic signal and their combinations significantly, and can even perform comparably with some supervised methods. Additionally, our LWRS trained on one dataset can be effectively transferred to other datasets, including those in other languages, which indicates its good generalization and potential for wide application.
What problem does this paper attempt to address?