FlexER: Flexible Entity Resolution for Multiple Intents

Bar Genossar,Roee Shraga,Avigdor Gal
DOI: https://doi.org/10.48550/arXiv.2209.07569
2022-10-27
Abstract:Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single interpretation of a real-world entity and focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this single interpretation. However, in real-world scenarios, where entity resolution is part of a more general data project, downstream applications may have varying interpretations of real-world entities relating, for example, to various user needs. In what follows, we introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task. As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution. FlexER addresses the problem as a multi-label classification problem. It combines intent-based representations of tuple pairs using a multiplex graph representation that serves as an input to a graph neural network (GNN). FlexER learns intent representations and improves the outcome to multiple resolution problems. A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.
Computation and Language,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how the entity resolution task deals with the situation of multiple intents in data cleaning and integration. Traditional entity resolution methods usually assume the existence of a single entity interpretation and mainly focus on identifying matching record pairs, that is, identifying different records belonging to the same real - world entity. However, in practical applications, different downstream applications may have different interpretation requirements for the same real - world entity. For example, different users may understand the same product according to different requirements. Therefore, the paper proposes the Multiple Intents Entity Resolution (MIER) problem, aiming to provide an entity resolution solution that can adapt to various interpretation requirements. Specifically, the paper points out: 1. **Limitations of traditional entity resolution**: Existing entity resolution methods assume the existence of a single entity set and mapping, which is not flexible enough in applications requiring personalized analysis or services. 2. **Requirements for multiple - intent entity resolution**: In scenarios such as online shopping and recommendation systems, users' needs vary, and entities need to be resolved according to different intents (such as brand, category, etc.). 3. **Proposed problem**: The paper defines the multiple - intent entity resolution problem, that is, generating multiple clean views in a dataset, each view corresponding to a different entity resolution intent. To solve this problem, the paper proposes the FlexER model. This model uses contemporary general entity resolution techniques to solve the multiple - intent entity resolution problem. FlexER constructs a multi - label classification problem, combines intent - based representation and multi - layer graph representation, and uses Graph Neural Network (GNN) to learn the relationships between different intents, thereby improving the effect of multiple - intent entity resolution. Through large - scale empirical evaluation, the paper shows the effectiveness of FlexER in solving the MIER problem and also outperforms existing methods in standard general entity resolution tasks.