Abstract:The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish language, thereby advancing the research in this NLP area. In this work, inspired by mMARCO and Mr.~TyDi datasets, we translated all accessible open IR datasets into Polish, and we introduced the BEIR-PL benchmark -- a new benchmark which comprises 13 datasets, facilitating further development, training and evaluation of modern Polish language models for IR tasks. We executed an evaluation and comparison of numerous IR models on the newly introduced BEIR-PL benchmark. Furthermore, we publish pre-trained open IR models for Polish language,d marking a pioneering development in this field. Additionally, the evaluation revealed that BM25 achieved significantly lower scores for Polish than for English, which can be attributed to high inflection and intricate morphological structure of the Polish language. Finally, we trained various re-ranking models to enhance the BM25 retrieval, and we compared their performance to identify their unique characteristic features. To ensure accurate model comparisons, it is necessary to scrutinise individual results rather than to average across the entire benchmark. Thus, we thoroughly analysed the outcomes of IR models in relation to each individual data subset encompassed by the BEIR benchmark. The benchmark data is available at URL {\bf

Universal Language Model Fine-Tuning with Subword Tokenization for Polish

Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language

Pre-training Polish Transformer-based Language Models at Scale

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

KLEJ: Comprehensive Benchmark for Polish Language Understanding

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2014

Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2013

From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation

Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

Punctuation Prediction for Polish Texts using Transformers

Spoken Language Translation for Polish

Evaluation of Sentence Representations in Polish

BEIR-PL: Zero Shot Information Retrieval Benchmark for the Polish Language