Abstract:Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries, and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository, in order to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users, in a situation where national and linguistics barriers blur, or are re-defined. This paper presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.

Legal document retrieval across languages: topic hierarchies based on synsets

Large-scale semantic exploration of scientific literature using topic-based hashing algorithms

Unveiling Themes in Judicial Proceedings: A Cross-Country Study Using Topic Modeling on Legal Documents from India and the UK

Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction

Cross-Lingual Document Clustering Based on Similarity Space Model

Understand Legal Documents with Contextualized Large Language Models

Discovering multilingual concepts from unaligned web documents by exploring associated images

Learning from syntax generalizations for automatic semantic annotation

A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Multilingual Evaluation of Semantic Textual Relatedness

Attentive Deep Neural Networks for Legal Document Retrieval

Sentence Embeddings and High-speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents

Legal information retrieval for understanding statutory terms

Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

What's in a ? Cross-Lingual Topic Detection & Information Retrieval in Archives Portal Europe

Discovering significant topics from legal decisions with selective inference

Multi-granular Legal Topic Classification on Greek Legislation

Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents

Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs

Constructing Pseudo Documents With Semantic Similarity For Short Text Topic Discovery

The early days of contemporary philosophy of science: novel insights from machine translation and topic-modeling of non-parallel multilingual corpora