From LIMA to DeepLIMA: following a new path of interoperability
Bocharov, Victor,Besançon, Romaric,de Chalendar, Gaël,Ferret, Olivier,Semmar, Nasredine
DOI: https://doi.org/10.1007/s10579-024-09773-5
2024-09-21
Language Resources and Evaluation
Abstract:In this article, we describe the architecture of the LIMA (Libre Multilingual Analyzer) framework and its recent evolution with the addition of new text analysis modules based on deep neural networks. We extended the functionality of LIMA in terms of the number of supported languages while preserving existing configurable architecture and the availability of previously developed rule-based and statistical analysis components. Models were trained for more than 60 languages on the Universal Dependencies 2.5 corpora, WikiNer corpora, and CoNLL-03 dataset. Universal Dependencies allowed us to increase the number of supported languages and generate models that could be integrated into other platforms. This integration of ubiquitous Deep Learning Natural Language Processing models and the use of standard annotated collections using Universal Dependencies can be viewed as a kind of model and data interoperability, complementary to the technical interoperability between systems.
computer science, interdisciplinary applications