MBTI BERT: A Transformer-Based Machine Learning Approach Using MBTI Model For Textual Inputs

Raad Bin Tareaf
DOI: https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00338
2022-12-01
Abstract:Social media platforms like Reddit provide a rich text source of information about the behavior, opinions and interests of online users. While the posts may vary in scope from users, the writing style and the use of phrases stays relatively similar. Just like a personality test, which predicts the personality out of the answers a person gives to a set of questions, there might be a potential to automate and connect the writing style to a specific MBTI personality class. In this paper, we use a huge amount of data from Reddit users for which we have the results of a Myers-Briggs type indicator test, to predict their type out of their textual input. For this task, we used different machine learning models to implement various techniques to create the final prediction. The methodologies involve evaluating different feature sets to examine if the word group size of the single features and the overall size of the feature sets has a significant influence on the outcomes. Eventually, different models were trained and evaluated on a unified training and test set to assess which model performs better with the proposed computational linguistic problem. The training showed that a huge imbalance in the training data was a major issue for the models. In our final examinations, the BERT model achieved the best results by a small margin against the XGBoost and SVM models.
Computer Science
What problem does this paper attempt to address?