期刊文献+

Redundancy Elimination in Multi-signature Based Parallel Entity Resolution

Redundancy Elimination in Multi-signature Based Parallel Entity Resolution
下载PDF
导出
摘要 The multi-signature method can improve the accuracy of entity resolution. However,it will bring the redundant computation problem in the parallel processing framework. In this paper,a multisignature based parallel entity resolution method called multi-sig-er is proposed. The method was implemented in MapReduce-based framework which first tagged multiple signatures for each input object and utilized these signatures to generate key-value pairs,then shuffled the pairs to the reduce tasks that are responsible for similarity computation. To improve the performance,two strategies were adopted. One is for pruning the candidate pairs brought by the blocking technique and the other is for eliminating the redundancy according to the transitive property. Both strategies reduce the number of similarity computation without affecting the resolution accuracy. Experimental results on real-world datasets show that the method tends to handle large datasets rather than small datasets,and it is more suitable for complex similarity computation as compared to simple similarity matching. The multi-signature method can improve the accuracy of entity resolution. However,it will bring the redundant computation problem in the parallel processing framework. In this paper,a multisignature based parallel entity resolution method called multi-sig-er is proposed. The method was implemented in MapReduce-based framework which first tagged multiple signatures for each input object and utilized these signatures to generate key-value pairs,then shuffled the pairs to the reduce tasks that are responsible for similarity computation. To improve the performance,two strategies were adopted. One is for pruning the candidate pairs brought by the blocking technique and the other is for eliminating the redundancy according to the transitive property. Both strategies reduce the number of similarity computation without affecting the resolution accuracy. Experimental results on real-world datasets show that the method tends to handle large datasets rather than small datasets,and it is more suitable for complex similarity computation as compared to simple similarity matching.
出处 《Journal of Donghua University(English Edition)》 EI CAS 2017年第4期556-562,共7页 东华大学学报(英文版)
基金 National Natural Science Foundation of China(No.61402100) the Fundamental Research Funds for the Central Universities of China(No.17D111205)
关键词 entity resolution MAPREDUCE blocking technique redundancy elimination entity resolution MapReduce blocking technique redundancy elimination
  • 相关文献

参考文献1

二级参考文献35

  • 1Bachraan A, Janiak A. Minimizing Maximum Lateness under Linear Deterioration [ J ]. European Journal of Operational Research, 2000, 126(3): 557-566.
  • 2Kunnathur A S, Gupta S K. Minimizing the Makespan with Late Start Penalties Added to Processing Times in a Single Facility Scheduling Problem [ J ]. European Journal of Operational Research, 1990, 47'(1) : 56-64.
  • 3Barketau M S, Cheng T C E, Ng C T, et al. Batch Schexlulir g of Step Deteriorating Jobs [ J]. Journal of Scheduling, 2008, 11 (I) : 1%28.
  • 4Alidacc B, Womcr N K. Scheduling with Time Dcpcndettt Processing Timcs: Review and Extensions [ J ]. Journal of theOperational Research Society, 1999, 50 ( 7 ) : 711-720.
  • 5Cheng T C E, Ding Q, Lin B M T. A Concise Survey of Scheduling with Time-Dependent Processing Times [ J]. European Journal of Operational Research, 2004, 152 (1) : !-13-.
  • 6Gawiejnowicz S. Time-Dependent Scheduling [ M ]. Berlin Heidelberg: Springer, 2008.
  • 7Guo P, Chen W M, Wang Y. A General Variable Neighborhood Search for Single-Machine Total Tardiness Scheduling Problem with Step-Deteriorating Jobs [ J ]. Journal of Industrial and Management Optimization, 2014, 10(4): 1071-1090.
  • 8Wang L Y, Huang X, Ji P, et al. Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time[J]. Optimization Letters, 2014, 8 (1) : 129-134.
  • 9Zhao C L, Tang H Y. Parallel Machines Scheduling with Deteriorating Jobs and Availability Constraints[ J]. Japan Journal of Industrial and Applied Mathematics, 2014, 31 (3) : 501-512.
  • 10Hsu C J, Yang D L. Unrelated Parallel-Machine Scheduling with Position-Dependent Deteriorating Jobs and Resource-Dependent Processing Time[J]. Optimization Letters, 2014, 8(2): 519- 531.

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部