Tran-Switch: A Transfer Learning Approach for Sentence Level Cross-Genre Author Profiling on Code-Switched English–RomanUrdu Text

Muhammad Adnan Ashraf,Rao Muhammad Adeel Nawab,Feiping Nie
DOI: https://doi.org/10.1016/j.ipm.2022.103261
IF: 7.466
2023-01-01
Information Processing & Management
Abstract:Cross-genre author profiling aims to build generalized models for predicting profile traits of authors that can be helpful across different text genres for computer forensics, marketing, and other applications. The cross-genre author profiling task becomes challenging when dealing with low-resourced languages due to the lack of availability of standard corpora and methods. The task becomes even more challenging when the data is code-switched, which is informal and unstructured. In previous studies, the problem of cross-genre author profiling has been mainly explored for mono-lingual texts in highly resourced languages (English, Spanish, etc.). However, it has not been thoroughly explored for the code-switched text which is widely used for communication over social media. To fulfill this gap, we propose a transfer learning-based solution for the cross-genre author profiling task on code-switched (English–RomanUrdu) text using three widely known genres, Facebook comments/posts, Tweets, and SMS messages. In this article, firstly, we experimented with the traditional machine learning, deep learning and pre-trained transfer learning models (MBERT, XLMRoBERTa, ULMFiT, and XLNET) for the same-genre and cross-genre gender identification task. We then propose a novel Trans-Switch approach that focuses on the code-switching nature of the text and trains on specialized language models. In addition, we developed three RomanUrdu to English translated corpora to study the impact of translation on author profiling tasks. The results show that the proposed Trans-Switch model outperforms the baseline deep learning and pre-trained transfer learning models for cross-genre author profiling task on code-switched text. Further, the experimentation also shows that the translation of RomanUrdu text does not improve results.
What problem does this paper attempt to address?