摘要
针对在基于字符串匹配的相似重复记录识别中,属性权值确定主观性太强的问题,提出一种模糊综合评判获取属性权值的方法。采用多用户对各属性的重要性组成因素进行等级评价,通过模糊映射获得反映属性重要性的权值,并以此为基础进行相似重复记录识别。理论分析和实验表明,该方法能客观地获取各属性权值,因而在相似重复记录识别中有较高的识别精度。
Aiming at the problem of very strong subjectivity in the attribute weight determination of dataset in identifying approximately duplicate records based on the character string matching method,the paper puts forward a method based on fuzzy integrated estimation to get attribute weight.It estimates the components of all attribute’s importance by multi users,and gets the attribute’s weight through fuzzy mapping,based on which the approximately duplicate records are identified.It can be proved from theory and practice that the method can objectively get all attribute weight,thus it has a higher precision in identifying approximately duplicate records.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第13期51-53,共3页
Computer Engineering
基金
湖南省教育厅科研基金资助项目(09C339)
湖南省科技计划基金资助项目(2008CK3083)
关键词
模糊综合评判
相似重复记录
属性权值
相似度
fuzzy integrated estimation
approximately duplicate records
attribute weight
similarity