Evaluation of BERT-Based Models on Patient Data from French Social Media

Emma Le Priol,Manissa Talmatkadi,Stéphane Schück,Nathalie Texier,Anita Burgun
DOI: https://doi.org/10.3233/SHTI240556
2024-08-22
Abstract:With the objective of extracting new knowledge about rare diseases from social media messages, we evaluated three models on a Named Entity Recognition (NER) task, consisting of extracting phenotypes and treatments from social media messages. We trained the three models on a dataset with social media messages about Developmental and Epileptic Encephalopathies and more common diseases. This preliminary study revealed that CamemBERT and CamemBERT-bio exhibit similar performance on social media testimonials, slightly outperforming DrBERT. It also highlighted that their performance was lower on this type of data than on structured health datasets. Limitations, including a narrow focus on NER performance and dataset-specific evaluation, call for further research to fully assess model capabilities on larger and more diverse datasets.
What problem does this paper attempt to address?