摘要
本文提出一种基于遗传神经网络的相似重复记录检测方法,充分利用了神经网络的非线性映射和遗传算法的全局优化特性,将基于学习的思想和进化的思想有效结合并应用到重复记录检测中,避开了传统方法计算属性权重的问题,并对遗传神经网络进行改进。实验结果表明本文方法能够有效地解决大数据量的相似重复记录检测问题,不仅具有好的检测精度,而且具有很好的时间效率。
This paper presents a genetic neural network for detection of approximately duplicate records by full use of non-linear mapping of neural networks and global optimization features of genetic algorithms.Learning-based ideas and the evolution of thinking is applied to the detection of duplicate records,avoiding the traditional method attribute weight problem.Experimental results show that this method can effectively solve the large data volume of approximately duplicated records detection of problems,not only has good detection accuracy,but also has good time efficiency.
出处
《计算机测量与控制》
CSCD
北大核心
2011年第5期1021-1023,共3页
Computer Measurement &Control
基金
河南省科技计划重点项目(102102210191)
河南省教育厅自然科学研究资助计划项目(2009A520013)
关键词
相似重复记录
遗传算法
神经网络
数据清洗
approximately duplicate records detection
genetic algorithms
neural network
data cleaning