Utilizing Cognitive Signals Generated During Human Reading to Enhance Keyphrase Extraction from Microblogs

Xinyi Yan,Yingyi Zhang,Chengzhi Zhang
DOI: https://doi.org/10.1016/j.ipm.2023.103614
IF: 7.466
2023-01-01
Information Processing & Management
Abstract:Microblogging platforms have seen exponential growth, leading to an abundance of usergenerated content. The challenge now is to efficiently extract crucial information from this vast and dispersed text data. It also serves as the goal of our research on Automatic Keyphrase Extraction (AKE) for microblog. Eye-tracking signals, that reflect users' tendency to prioritize certain words while reading, have been employed to enhance AKE performance from microblogs. However, relying solely on eye-tracking has its limitations owing to constraints in physiological mechanism support, acquisition techniques, and feature decoding. Consequently, we propose the integration of electroencephalogram (EEG) signals with eye-tracking signals to improve microblogs-based AKE, thereby overcoming the aforementioned limitations. Our first step is identifying specific features present in cognitive signals generated during human reading. We selected EEG signals (8 features) and eye-tracking signals (17 features) from the cognitive language processing corpus ZUCO, to examine the efficacy when they are combined with the microblogs-based AKE. To avoid cognitive signal distortion by certain model structures, we introduced these signals at the inputs of the soft attention layer and at the query vectors of the self-attention layer. For evaluation, we performed several AKE tests on microblogs with various combinations of cognitive signals. The results demonstrate a consistent enhancement in the performance of AKE due to cognitive signals generated during human reading, regardless of different feature combinations and models. Specifically, EEG signals exhibited the most significant improvement. However, combining EEG signals with eye-tracking signals yielded results that fell between the performance levels of the two signal types, indicating that their integration might have some synergistic effects. Further investigation is needed to understand the underlying mechanisms responsible for this outcome. The code and dataset for this paper can be accessed at https://github.com/yan-xinyi/AKE.
What problem does this paper attempt to address?