Entity matching is a fundamental problem of data integration.It groups records according to underlying real-world entities.There is a growing trend of entity matching via deep learning techniques.We design mixed hiera...Entity matching is a fundamental problem of data integration.It groups records according to underlying real-world entities.There is a growing trend of entity matching via deep learning techniques.We design mixed hierarchical deep neural networks(MHN)for entity matching,exploiting semantics from different abstract levels in the record internal hierarchy.A family of attention mechanisms is utilized in different periods of entity matching.Self-attention focuses on internal dependency,inter-attention targets at alignments,and multi-perspective weight attention is devoted to importance discrimination.Especially,hybrid soft token alignment is proposed to address corrupted data.Attribute order is for the first time considered in deep entity matching.Then,to reduce utilization of labeled training data,we propose an adversarial domain adaption approach(DA-MHN)to transfer matching knowledge between different entity matching tasks by maximizing classifier discrepancy.Finally,we conduct comprehensive experimental evaluations on 10 datasets(seven for MHN and three for DA-MHN),which illustrate our two proposed approaches1 superiorities.MHN apparently outperforms previous studies in accuracy,and also each component of MHN is tested.DA-MHN greatly surpasses existing studies in transferability.展开更多
基金the National Natural Science Foundation of China under Grant Nos.62002262,61672142,61602103,62072086 and 62072084the National Key Research and Development Project of China under Grant No.2018YFB1003404.
文摘Entity matching is a fundamental problem of data integration.It groups records according to underlying real-world entities.There is a growing trend of entity matching via deep learning techniques.We design mixed hierarchical deep neural networks(MHN)for entity matching,exploiting semantics from different abstract levels in the record internal hierarchy.A family of attention mechanisms is utilized in different periods of entity matching.Self-attention focuses on internal dependency,inter-attention targets at alignments,and multi-perspective weight attention is devoted to importance discrimination.Especially,hybrid soft token alignment is proposed to address corrupted data.Attribute order is for the first time considered in deep entity matching.Then,to reduce utilization of labeled training data,we propose an adversarial domain adaption approach(DA-MHN)to transfer matching knowledge between different entity matching tasks by maximizing classifier discrepancy.Finally,we conduct comprehensive experimental evaluations on 10 datasets(seven for MHN and three for DA-MHN),which illustrate our two proposed approaches1 superiorities.MHN apparently outperforms previous studies in accuracy,and also each component of MHN is tested.DA-MHN greatly surpasses existing studies in transferability.