Abstract:The application of artificial intelligence in the legal domain has received significant attention from legal professionals and AI researchers in recent years. The intelligent judge system has made remarkable progress due to advancements in natural language processing, particularly deep learning. Matching similar cases has enormous potential with significant implications for the legal domain. Matching and analyzing similar cases helps legal professionals make more reasonable judgments, ensuring fairness, consistency, and accuracy in law applications. The existing methods did not fully use representation-based and interaction-based text matching in the feature extraction. This paper presents an innovative approach that employs ensemble learning with multiple models to enhance the prediction of legal case similarity. The method comprises two sub-networks: a similarity representation sub-network and a binary classification judgment sub-network. The similarity representation sub-network is trained using contrastive learning, focusing on semanticizing the similarity between sample features to distinguish between dissimilar samples and reduce the distance between similar ones. Furthermore, the binary classification judgment sub-network integrates sample pairs to facilitate feature interaction between text pairs during extraction. The training of these two sub-networks employs different information processing and optimization loss, which allows ensemble learning to capitalize on the strengths of both models and significantly improve the accuracy of predicting the similarity of legal cases. The accuracy of our method on the test set is 74.53%, outperforming other existing methods on the public dataset CAIL2019-SCM.

Ensemble Methods for Word Embedding Model Based on Judicial Text.

Word Embedding Based Document Similarity for the Inferring of Penalty.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Similarity Analysis of Law Documents Based on Word2vec

Legal Document Similarity Matching Based on Ensemble Learning

DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

Visual Exploration and Comparison of Word Embeddings.

Bridging the Gap between Different Vocabularies for LLM Ensemble

An Evaluation Dataset for Legal Word Embedding: A Case Study On Chinese Codex

Establish Evidence Chain Model on Chinese Criminal Judgment Documents Using Text Similarity Measure.

An Empirical Study of Linear Dimensionality Reduction for Judicial Predictive Models

Improving statistical word alignment with ensemble methods

A Probabilistic Model for Learning Multi-Prototype Word Embeddings.

Joint Learning of Character and Word Embeddings.

AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation

Evaluating Word Embedding Models: Methods and Experimental Results

Topical Word Embeddings

A knowledge-enriched ensemble method for word embedding and multi-sense embedding

Investigating Language Universal and Specific Properties in Word Embeddings

Learning Word Embeddings from Intrinsic and Extrinsic Views

Learning legal text representations via disentangling elements