期刊文献+

基于非主属性离群点检测的实体匹配 被引量:1

Entity matching of outlier detection based on non-primary attributes
下载PDF
导出
摘要 为解决互联网上不同源中同一实体描述多样性的问题,提出一种基于非主属性离群点检测的实体匹配方法。利用非主属性值消除主属性值不同带来的歧义,非主属性值可较快排除不匹配实体,极大提高匹配效率。该匹配方法在一定程度上克服了离群点匹配在传统奇异值分解中不能应用在大规模数据的弊端,其基于规则的方法对数据进行粗筛选,降低实体对的数据规模;根据离群点检测模型做进一步筛选,得到初步的实体对集;根据生成的实体对集进行采样,利用机器学习选择合适的匹配器并训练来获取匹配对。实验结果表明,该方法使准确率和召回率得到提高,其有效性得到验证。 To solve the problem of diversity of the same entity in different sources on the Internet,an entity matching method based on non-primary attribute outlier detection was proposed.Non-primary attribute values were used to eliminate the ambiguity caused by different primary attribute values.At the same time,non-primary attribute values also excluded unmatched entities quickly,which greatly improved the matching efficiency.In addition,this matching method overcame the disadvantages that outlier matching can not be applied to large-scale data in traditional singular value decomposition to a certain extent.A rule-based method was used to roughly filter the data and reduce the data size of the entity pair.Further screening was performed according to the outlier detection model to obtain a preliminary entity pair set.The set of entities was sampled.Machine learning was used to select the appropriate matcher and it was trained to get matched pair.Experimental results verify the effectiveness of the method and it greatly improves the accuracy and recall rate.
作者 曹卫东 王广森 王怀超 CAO Wei-dong;WANG Guang-sen;WANG Huai-chao(College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
出处 《计算机工程与设计》 北大核心 2019年第8期2247-2252,共6页 Computer Engineering and Design
基金 民航科技重大专项基金项目(MHRD20150107、MHRD20160109) 中央高校基本业务费基金项目(3122014C017)
关键词 实体匹配 非主属性 离群点检测 粗筛选 匹配器 entity matching non-primary attribute outlier detection rough filter matcher
  • 相关文献

参考文献3

二级参考文献44

  • 1吴江明,栾连军,程翼宇.胶束毛细管电泳法同时测定黄连-吴茱萸药对中5种生物碱的含量[J].药物分析杂志,2006,26(3):325-328. 被引量:13
  • 2沈涛.黄连吴茱萸组方对实验性高脂模型小鼠的降脂实验研究[J].成都中医药大学学报,2007,30(1):18-19. 被引量:18
  • 3李霞,张绍林,张淼,刘华.基于新距离测度的区间数排序[J].西华大学学报(自然科学版),2008,27(1):87-90. 被引量:17
  • 4Scannapieco M. Object matching: New challenges for record linkage. The Philosophy of Information Quality, 2014, 358 (38) : 95-106.
  • 5Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proceedings of the VLDB Endowment, 2009, 2 (1) : 407-418.
  • 6Cheatham M, Hitzler P. String similarity metrics for ontology alignment//Proceedings of the 12th International Semantic Web Conference. Sydney, Australia, 2013:294-309.
  • 7Li M, Chen X, Li X, et al. The similarity metric. IEEE Transactions on Information Theory, 2004, 50(12): 3250- 3264.
  • 8Dey D, Mookerjee V S, Liu D. Efficient techniques for online record linkage. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(3): 373-387.
  • 9Aizawa A, Oyama K. A fast linkage detection scheme for multi source information integration//Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration. Tokyo, Japan, 2005:30-39.
  • 10Wang J, Li G, Feng J. Can we beat the prefix filtering?: An adaptive framework for similarity join and search//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. Scottsdale, USA, 2012:85-96.

共引文献12

同被引文献16

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部