Abstract:In common law jurisdictions, legal practitioners rely on precedents to construct arguments, in line with the doctrine of \emph{stare decisis}. As the number of cases grow over the years, prior case retrieval (PCR) has garnered significant attention. Besides lacking real-world scale, existing PCR datasets do not simulate a realistic setting, because their queries use complete case documents while only masking references to prior cases. The query is thereby exposed to legal reasoning not yet available when constructing an argument for an undecided case as well as spurious patterns left behind by citation masks, potentially short-circuiting a comprehensive understanding of case facts and legal principles. To address these limitations, we introduce a PCR dataset based on judgements from the European Court of Human Rights (ECtHR), which explicitly separate facts from arguments and exhibit precedential practices, aiding us to develop this PCR dataset to foster systems' comprehensive understanding. We benchmark different lexical and dense retrieval approaches with various negative sampling strategies, adapting them to deal with long text sequences using hierarchical variants. We found that difficulty-based negative sampling strategies were not effective for the PCR task, highlighting the need for investigation into domain-specific difficulty criteria. Furthermore, we observe performance of the dense models degrade with time and calls for further research into temporal adaptation of retrieval models. Additionally, we assess the influence of different views , Halsbury's and Goodhart's, in practice in ECtHR jurisdiction using PCR task.

Citation Data of Czech Apex Courts

The Czech Court Decisions Corpus (CzCDC): Availability as the First Step

JUSTICE: A Benchmark Dataset for Supreme Court's Judgment Prediction

Predicting citations in Dutch case law with natural language processing

CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal Assistive Writing

Citation-Based Summarization of Landmark Judgments

Judgement Citation Retrieval using Contextual Similarity

The Cambridge Law Corpus: A Dataset for Legal AI Research

Czech Dataset for Cross-lingual Subjectivity Classification

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

ECtHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights

A group of people who can participate in administrative court proceedings regarding access to public information. The gloss approval to the judgment of the Supreme Administrative Court of 4 November 2016, I OSK 1372/15

Combining topic modelling and citation network analysis to study case law from the European Court on Human Rights on the right to respect for private and family life

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction.

A Dataset and Strong Baselines for Classification of Czech News Texts

A Comparative Study of Text Retrieval Models on DaReCzech

Personal Data Protection in the Decision-Making of the CJEU Before and After the Lisbon Treaty

CSRCZ: A Dataset About Corporate Social Responsibility in Czech Republic

LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law Dataset

An explainable approach to detect case law on housing and eviction issues within the HUDOC database

Towards Open Data for the Citation Content Analysis