Abstract:We present an open-source web service for Czech morphosyntactic analysis. The system combines a deep learning model with rescoring by a high-precision morphological dictionary at inference time. We show that our hybrid method surpasses two competitive baselines: While the deep learning model ensures generalization for out-of-vocabulary words and better disambiguation, an improvement over an existing morphological analyser MorphoDiTa, at the same time, the deep learning model benefits from inference-time guidance of a manually curated morphological dictionary. We achieve 50% error reduction in lemmatization and 58% error reduction in POS tagging over MorphoDiTa, while also offering dependency parsing. The model is trained on one of the currently largest Czech morphosyntactic corpora, the PDT-C 1.0, with the trained models available at <a class="link-external link-https" href="https://hdl.handle.net/11234/1-5293" rel="external noopener nofollow">this https URL</a>. We provide the tool as a web service deployed at <a class="link-external link-https" href="https://lindat.mff.cuni.cz/services/udpipe/" rel="external noopener nofollow">this https URL</a>. The source code is available at GitHub (<a class="link-external link-https" href="https://github.com/ufal/udpipe/tree/udpipe-2" rel="external noopener nofollow">this https URL</a>), along with a Python client for a simple use. The documentation for the models can be found at <a class="link-external link-https" href="https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model" rel="external noopener nofollow">this https URL</a>.

ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

ÚFAL CorPipe at CRAC 2022: Effectivity of Multilingual Models for Coreference Resolution

LatinCy: Synthetic Trained Pipelines for Latin NLP

CorPipe at CRAC 2024: Predicting Zero Mentions from Raw Text

A Transition-based System for Universal Dependency Parsing

Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation

Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech

Joint Learning of POS and Dependencies for Multilingual Universal Dependency Parsing.

Multilingual Universal Dependency Parsing from Raw Text with Low-Resource Language Enhancement.

TartuNLP at EvaLatin 2024: Emotion Polarity Detection

From LIMA to DeepLIMA: following a new path of interoperability

A Fast and Lightweight System for Multilingual Dependency Parsing.

A Simple Yet Effective Joint Training Method for Cross-Lingual Universal Dependency Parsing.

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN