SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

Araújo, Samur; Thanh Tran, D.; de Vries, Arjen; Schwabe, D.

doi:10.1109/TKDE.2014.2365779

S. Araújo (Samur), D. Thanh Tran, A.P. de Vries (Arjen) and D. Schwabe

2014-10-01

SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

IEEE Transactions on Knowledge and Data Engineering , Volume 27 - Issue 5 p. 1397- 1440

State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks.

Additional Metadata
THEME	Information (theme 2)
Stakeholder	Unspecified
Publisher	I.E.E.E. Computer Society Press
Persistent URL	doi.org/10.1109/TKDE.2014.2365779
Journal	IEEE Transactions on Knowledge and Data Engineering
Organisation	Human-Centered Data Analytics
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Araújo, S., Thanh Tran, D., de Vries, A.& Schwabe, D. (2014). SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1397–1440.https://doi.org/10.1109/TKDE.2014.2365779

View at Publisher

Free Full Text ( Final Version , 663kb )

Additional Files
Fulltext Final Version
Publisher Version

SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

Publication

Publication

Address

CWI researchers

Questions or comments?

SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

Publication

Publication

Workflow

Workflow

Add Content