Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context

Andreas Troxler,Jürg Schelldorfer
2023-09-25
Abstract:This tutorial demonstrates workflows to incorporate text data into actuarial classification and regression tasks. The main focus is on methods employing transformer-based models. A dataset of car accident descriptions with an average length of 400 words, available in English and German, and a dataset with short property insurance claims descriptions are used to demonstrate these techniques. The case studies tackle challenges related to a multi-lingual setting and long input sequences. They also show ways to interpret model output, to assess and improve model performance, by fine-tuning the models to the domain of application or to a specific prediction task. Finally, the tutorial provides practical approaches to handle classification tasks in situations with no or only few labeled data, including but not limited to ChatGPT. The results achieved by using the language-understanding skills of off-the-shelf natural language processing (NLP) models with only minimal pre-processing and fine-tuning clearly demonstrate the power of transfer learning for practical applications.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use natural language processing (NLP) techniques, especially Transformer - based models, to process text data in actuarial science and apply it to classification and regression tasks. Specifically, the paper focuses on the following aspects: 1. **Challenges in a multilingual environment**: How to maintain the effectiveness and accuracy of the model when processing text data containing multiple languages (such as English and German). 2. **Processing of long input sequences**: How to effectively process long - text inputs, for example, descriptions of traffic accidents with an average length of 400 words. 3. **Model interpretability**: How to improve the interpretability of the model to make the prediction or classification results of the model more transparent, which is especially important for actuarial science that requires transparent decision - making. 4. **Situations with a small amount of labeled data**: How to handle classification tasks when the labeled data is limited, including but not limited to using pre - trained models such as ChatGPT for information extraction. 5. **Model performance evaluation and improvement**: How to evaluate and improve model performance by fine - tuning the model to adapt to specific application areas or specific prediction tasks. The paper demonstrates the application of these techniques through two actual datasets: - **Automobile accident description dataset**: It contains approximately 7,000 automobile accident descriptions in English (partially translated into German), as well as some tabular data (such as the number of vehicles involved, whether there are any casualties, etc.). - **Property insurance claim record dataset**: It contains approximately 6,000 property insurance claim records. Each record includes the claim amount, a brief English description, and 9 different types of disaster types. Through these case studies, the paper shows how to use Transformer models to deal with the above - mentioned challenges and provides practical methods to address problems in practical applications.