What is it about?
Entity resolution (ER) is the task of grouping together records that correspond to a single entity, such as a person or a product, in the absence of a unique identifier. For example, assume there is a record of a John Smith from Queensland, Australia, in a health database, and another record of a John Smith from Melbourne, Australia in an immigration database. If there is no unique personal identifier common to these databases (such as a Tax File Number, which is unique to a single individual), we need to rely on ER techniques to determine whether these two records refer to one single person (true match) who has potentially moved from Queensland to Melbourne, or two different people (true non match) with the same name. A main challenge in ER is the lack of ground truth data in the form of known true matches and non matches. This hinders the ability to assess how well ER techniques perform in real world applications. In this paper, we propose novel methods to estimate the quality of ER techniques in the absence of ground truth data, which are therefore referred to as unsupervised evaluation techniques of ER.
Featured Image
Photo by Growtika on Unsplash
Why is it important?
Many organisations that apply ER for operational purposes struggle to determine how confident they can be about the accuracy of resolved entities due to the absence or lack of ground truth data. Our proposed quality estimation approaches are helpful in these real-world scenarios to gain an idea of how good a given ER technique may or may not be.
Perspectives
There is very limited work that explores how to assess ER approaches in the absence of ground truth data, despite the necessity of having such unsupervised evaluation techniques in real-world ER tasks. Therefore, I believe our work would be an encouragement for ER enthusiasts and researchers to put more thought and innovation into solving this important problem.
Charini Nanayakkara
Australian National University
Read the Original
This page is a summary of: Unsupervised Evaluation of Entity Resolution, Journal of Data and Information Quality, March 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3721985.
You can read the full text:
Contributors
The following have contributed to this page







