Extending the Abstraction of Personality Types based on MBTI with Machine Learning and Natural Language Processing

Carlos Basto
DOI: https://doi.org/10.48550/arXiv.2105.11798
2021-05-25
Abstract:A data-centric approach with Natural Language Processing (NLP) to predict personality types based on the MBTI (an introspective self-assessment questionnaire that indicates different psychological preferences about how people perceive the world and make decisions) through systematic enrichment of text representation, based on the domain of the area, under the generation of features based on three types of analysis: sentimental, grammatical and aspects. The experimentation had a robust baseline of stacked models, with premature optimization of hyperparameters through grid search, with gradual feedback, for each of the four classifiers (dichotomies) of MBTI. The results showed that attention to the data iteration loop focused on quality, explanatory power and representativeness for the abstraction of more relevant/important resources for the studied phenomenon made it possible to improve the evaluation metrics results more quickly and less costly than complex models such as the LSTM or state of the art ones as BERT, as well as the importance of these results by comparisons made from various perspectives. In addition, the study demonstrated a broad spectrum for the evolution and deepening of the task and possible approaches for a greater extension of the abstraction of personality types.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to predict personality types based on MBTI (Myers - Briggs Type Indicator) through a data - driven method combined with natural language processing (NLP) techniques. Specifically, the paper aims to improve the accuracy of personality type prediction by systematically enriching text representations and generating features based on three analyses in the field (sentiment analysis, syntactic analysis, and aspect analysis). In addition, the paper also explores the possibility of improving the evaluation metric results in a faster and less costly way by optimizing data quality, interpretability, and representativeness in the data iteration loop. Compared with complex models such as LSTM or BERT, this method may be more effective. The paper emphasizes the importance of focusing on data quality and feature abstraction in the task of predicting personality types.