Author Identity Unveiled: Gender and Age Prediction from Textual Patterns using BERT

Boya Chaitanya Kiran,Balla Charishma Sulochana,Manju Venugopalan,Bhavya Sri Pragada,Gaddam Anvith Reddy
DOI: https://doi.org/10.1109/CONIT61985.2024.10626311
2024-06-21
Abstract:The research delves into author profiling, aiming to identify writers’ age groups and genders using extensive textual data. This involves utilizing BERT embeddings to understand sentence structure, word selection, and the overall tone of the writing. The primary goal is to employ supervised machine learning models to categorize authors based on age and gender, aiding marketers in tailoring campaigns for diverse demographics. As for text vectorization methods, BERT embeddings stand out for their ability to capture contextual information efficiently. The objective of this research is to develop a robust solution for author profiling. Additionally, techniques like TF-IDF is also considered effective for text representation, each offering unique advantages depending on the specific task and dataset characteristics. The outcome of this study strongly indicates that the AdaBoost with TF-IDF stands out as the most precise machine learning models for age group prediction with an F1 score of 0.58. The K-neighbours with TF-IDF is the best model for gender prediction with an F1 score of 0.58.
Computer Science
What problem does this paper attempt to address?