Abstract:Legal case retrieval has received increasing attention in recent years. However, compared to ad-hoc retrieval tasks, legal case retrieval has its unique challenges. First, case documents are rather lengthy and contain complex legal structures. Therefore, it is difficult for most existing dense retrieval models to encode an entire document and capture its inherent complex structure information. Most existing methods simply truncate part of the document content to meet the input length limit of PLMs, which will lead to information loss. Additionally, the definition of relevance in the legal domain differs from that in the general domain. Previous semantic-based or lexical-based methods fail to provide a comprehensive understanding of the relevance of legal cases. In this paper, we propose a S tructured L egal case R etrieval (SLR) framework, which incorporates internal and external structural information to address the above two challenges. Specifically, to avoid the truncation of long legal documents, the internal structural information, which is the organization pattern of legal documents, can be utilized to split a case document into segments. By dividing the document-level semantic matching task into segment-level subtasks, SLR can separately process segments using different methods based on the characteristic of each segment. In this way, the key elements of a case document can be highlighted without losing other content information. Secondly, towards a better understanding of relevance in the legal domain, we investigate the connections between criminal charges appearing in large-scale case corpus to generate a charge-wise relation graph. Then, the similarity between criminal charges can be pre-computed as the external structural information to enhance the recognition of relevant cases. Finally, a learning-to-rank algorithm integrates the features collected from internal and external structures to output the final retrieval results. Experimental results on public legal case retrieval benchmarks demonstrate the superior effectiveness of SLR over existing state-of-the-art baselines, including traditional bag-of-words and neural-based methods. Furthermore, we conduct a case study to visualize how the proposed model focuses on key elements and improves retrieval performance.

Incorporating Retrieval Information into the Truncation of Ranking Lists for Better Legal Search

Learning to Truncate Ranked Lists for Information Retrieval

List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation

Incorporating Structural Information into Legal Case Retrieval

Ranked List Truncation for Large Language Model-based Re-Ranking

R 2 : A Novel Recall & Ranking Framework for Legal Judgment Prediction

Boosting legal case retrieval by query content selection with large language models

Investigating User Behavior in Legal Case Retrieval

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

A Deep Analysis of an Explainable Retrieval Model for Precision Medicine Literature Search

Understanding Relevance Judgments in Legal Case Retrieval

Result Diversification for Legal Case Retrieval

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System

Caseformer: Pre-training for Legal Case Retrieval Based on Inter-Case Distinctions

Towards an In-Depth Comprehension of Case Relevance for Better Legal Retrieval

THUIR@COLIEE-2020: Leveraging Semantic Understanding and Exact Matching for Legal Case Retrieval and Entailment

Towards Explainable Retrieval Models for Precision Medicine Literature Search

Iterative Self-Supervised Learning for Legal Similar Case Retrieval

JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking

A Simple yet Effective Framework for Active Learning to Rank

BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval