Abstract:The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish language, thereby advancing the research in this NLP area. In this work, inspired by mMARCO and Mr.~TyDi datasets, we translated all accessible open IR datasets into Polish, and we introduced the BEIR-PL benchmark -- a new benchmark which comprises 13 datasets, facilitating further development, training and evaluation of modern Polish language models for IR tasks. We executed an evaluation and comparison of numerous IR models on the newly introduced BEIR-PL benchmark. Furthermore, we publish pre-trained open IR models for Polish language,d marking a pioneering development in this field. Additionally, the evaluation revealed that BM25 achieved significantly lower scores for Polish than for English, which can be attributed to high inflection and intricate morphological structure of the Polish language. Finally, we trained various re-ranking models to enhance the BM25 retrieval, and we compared their performance to identify their unique characteristic features. To ensure accurate model comparisons, it is necessary to scrutinise individual results rather than to average across the entire benchmark. Thus, we thoroughly analysed the outcomes of IR models in relation to each individual data subset encompassed by the BEIR benchmark. The benchmark data is available at URL {\bf

Assessing generalization capability of text ranking models in Polish

PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods

Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG

BEIR-PL: Zero Shot Information Retrieval Benchmark for the Polish Language

Evaluation of Sentence Representations in Polish

A Comparative Study of Text Retrieval Models on DaReCzech

Towards Optimizing a Retrieval Augmented Generation using Large Language Model on Academic Data

Towards Robust Ranker for Text Retrieval

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

Improving Domain-Specific Retrieval by NLI Fine-Tuning

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Evaluating Generative Ad Hoc Information Retrieval

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Policy-Gradient Training of Language Models for Ranking

Passage Retrieval of Polish Texts Using OKAPI BM25 and an Ensemble of Cross Encoders

KLEJ: Comprehensive Benchmark for Polish Language Understanding

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation

Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization

Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU

Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker