Abstract:Cross-genre author profiling aims to build generalized models for predicting profile traits of authors that can be helpful across different text genres for computer forensics, marketing, and other applications. The cross-genre author profiling task becomes challenging when dealing with low-resourced languages due to the lack of availability of standard corpora and methods. The task becomes even more challenging when the data is code-switched, which is informal and unstructured. In previous studies, the problem of cross-genre author profiling has been mainly explored for mono-lingual texts in highly resourced languages (English, Spanish, etc.). However, it has not been thoroughly explored for the code-switched text which is widely used for communication over social media. To fulfill this gap, we propose a transfer learning-based solution for the cross-genre author profiling task on code-switched (English–RomanUrdu) text using three widely known genres, Facebook comments/posts, Tweets, and SMS messages. In this article, firstly, we experimented with the traditional machine learning, deep learning and pre-trained transfer learning models (MBERT, XLMRoBERTa, ULMFiT, and XLNET) for the same-genre and cross-genre gender identification task. We then propose a novel Trans-Switch approach that focuses on the code-switching nature of the text and trains on specialized language models. In addition, we developed three RomanUrdu to English translated corpora to study the impact of translation on author profiling tasks. The results show that the proposed Trans-Switch model outperforms the baseline deep learning and pre-trained transfer learning models for cross-genre author profiling task on code-switched text. Further, the experimentation also shows that the translation of RomanUrdu text does not improve results.

Tran-Switch: A Transfer Learning Approach for Sentence Level Cross-Genre Author Profiling on Code-Switched English–RomanUrdu Text

A Study of Deep Learning Methods for Same-Genre and Cross-Genre Author Profiling.

Author Profiling on Bi-Lingual Tweets

How Different Text-preprocessing Techniques Using The BERT Model Affect The Gender Profiling of Authors

Choosing Transfer Languages for Cross-Lingual Learning

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

Evaluating Transformers and Linguistic Features integration for Author Profiling tasks in Spanish

Author Profiling in Code-Mixed WhatsApp Messages Using Stacked Convolution Networks and Contextualized Embedding Based Text Augmentation

Backtranslate what you are saying and I will tell who you are

Investigating cross-lingual training for offensive language detection

Can You Traducir This? Machine Translation for Code-Switched Input

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

Improving Cross-lingual Representation for Semantic Retrieval with Code-switching

A Personalized Cross-Platform Post Style Transfer Method Based on Transformer and Bi-Attention Mechanism

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

Cross-media User Profiling with Joint Textual and Social User Embedding.

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing