数据集成中不一致性数据相似性比较的加权算法被引量：1

A Weight Algorithm for Similarity Comparison of Inconsistency Data in Integrating Data

下载PDF

导出

摘要 Reducing inconsistency is the key problem to improve data quality during data integration. In this paper,we first present a weighted algorithm of similarity coefficient which is superior to traditional algorithms if the sourcedata have multiple characteristic items ,all of which have to be taken into account ,especially during the complex infor-mation integration. Secondly,we apply it to the experiment of telecommunication customers integrating ,the results ofdata clustering show it has high feasibility and precision performance. Reducing inconsistency is the key problem to improve data quality during data integration. In this paper, we first present a weighted algorithm of similarity coefficient which is superior to traditional algorithms if the source data have multiple characteristic items, all of which have to be taken into account .especially during the complex information integration. Secondly,we apply it to the experiment of telecommunication customers integrating,the results of data clustering show it has high feasibility and precision performance.

作者张艳秋徐六通王柏

机构地区北京邮电大学计算机科学与技术学院

出处《计算机科学》 CSCD 北大核心 2003年第8期92-92,F004,共2页 Computer Science

关键词数据集成数据源数据挖掘数据存储不一致性数据相似性加权算法 Data integration,Similarity coefficient,Weight integration, Cluster

分类号 TP274.2 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献6

1[1]Lujan-Mora S,Palomar M. Reducing Inconsistency in Integrating Data from Different Sources. IDEAS,2001. 209～218
2[2]Levenshtein V I. Binary codes capable of correcting deletions,insertions,and reversals. Cybernetics and Control Theory, 1966,10:707～710
3[3]Hirschberg D S. Serial Computations of Levenshtein Distances. In:A. Apostolico,Z. Galil,eds. Pattern Matching Algorithms. Oxford University Press, 1997
4[4]Lujan-Mora S. An Algorithm for Computing the Invariant Distance from Word Position. Internet. http://www. dlsi. ua. es/～slujan/files/idwp. ps,June 2000
5[5]Lujan-Mora S,Palomar M. Clustering of Similar Values,in Spanish,for the Improvement of Search Systems. In..M.C. Monard and J. S. Sichman, eds. International Joint Conf. IBERAMIASBIA 2000 Open Discussion Track Proceedings, Atibaia, Sao Paulo (Brazil), ICMC/USP, 2002.217 ～ 226
6[6]French J C,Powell A L,Schulman. Applications of Approximate Word Matching in Information Retrieval. In F. Golshani and K.Makki, eds. Proc. of the Sixth Intl. Conf. on Information and Knowledge Management CIKM 1997),Las Vegas (USA),ACM Press,1997.9～15

同被引文献5

1JoyceBischoff TedAlexander著成栋魏立原译.数据仓库技术[M].北京：电子工业出版社,1998.6.
2Baeza-Yates R, Perlegerg C. Fast and proctical approximate pattern matching [ J ]. Information Processing Letters,1996,59:21 - 27.
3刘明吉,王秀峰,黄亚楼.数据挖掘中的数据预处理[J].计算机科学,2000,27(4):54-57. 被引量：126
4佘春红.基于优先队列的增量式重复记录识别[J].计算机应用,2003,23(9):61-63. 被引量：7
5李华,易宝林,桂浩.基于动态规划的缩写发现算法[J].武汉大学学报（工学版）,2004,37(1):128-131. 被引量：2

引证文献1

1沈睿芳,郭立甫,时希杰.数据挖掘中的数据预处理模型与算法研究[J].计算机系统应用,2005,14(7):44-46. 被引量：20

二级引证文献20

1孙可可,李忠,李海洋,李莹,王圆圆.大学生图书馆门禁数据与成绩关联分析[J].电脑知识与技术,2020,0(4):235-236.
2向浩求,危韧勇.基于数据挖掘的信用卡数据预处理研究[J].现代商业,2007(06Z):185-185.
3宋应湃,汪林林,宋华.数据预处理在IT基础设施监控系统中的应用[J].计算机工程与设计,2007,28(15):3770-3772. 被引量：2
4王巍,叶水生,江泽涛,舒远仲.基于序列模式挖掘的审计系统[J].计算机安全,2008(8):13-15.
5韩采芹,欧阳俊林.基于模糊聚类和属性权重的网格资源选择[J].福建电脑,2009,25(6):91-92.
6刘波,潘久辉,刘佩珊.规则评估方法与数据质量挖掘系统[J].计算机集成制造系统,2009,15(7):1436-1441. 被引量：3
7刘波,潘久辉.采用属性相关分析的异常数据检测方法[J].系统工程与电子技术,2011,33(1):202-207. 被引量：7
8邓南沙,苏文.基于数据挖掘技术的股票市场预测分析实例研究[J].科技与企业,2012(18):272-274. 被引量：1
9李斌,李春洪,刘苏洋,谢涌纹.探索性仿真数据预处理需求分析[J].计算机仿真,2012,29(11):64-67. 被引量：1
10曹洪欣,蔡海英,王侠,王霞.基于EMR数据挖掘的临床路径构建中EMR数据预处理[J].中国医院管理,2013,33(3):58-60. 被引量：4

1熊赟,朱扬勇.特异群组挖掘:框架与应用[J].大数据,2015,1(2):66-77. 被引量：5
2ARIF Iqbal.Architecture of Integrated Data Clustering Machine[J].Computer Aided Drafting,Design and Manufacturing,2009,19(2):43-48.
3张涛.2008年中国ICT市场的新趋势[J].中国新通信,2008,10(2):16-18. 被引量：1
4孙喜来,王欣,葛昂,郑家民,邓宏斌.面向相似度的多维异构数据比对模型研究[J].信息安全与技术,2011,2(9):71-76.
5Ban Xiaojuan,Ning Shurong,Xu Zhaolin,Cheng Peng.Novel method for the evaluation of data quality based on fuzzy control[J].Journal of Systems Engineering and Electronics,2008,19(3):606-610. 被引量：1
6张喆,黄沛.UML Galaxy Diagram:An Approach of Conceptual Data Integration for OLAP[J].Journal of Shanghai Jiaotong university(Science),2006,11(1):60-65.
7冷鹏.数据集成的应用[J].软件世界,2008(1):75-77. 被引量：3
8姜福祥,潘洋宇.三坐标测量机测量数据的降噪[J].工具技术,2007,41(10):102-105. 被引量：1
9丁威.基于P2P的多媒体数据相似性查询方法[J].一重技术,2010(4):50-53.
10常志玲,王岚.一种新的决策树模型在就业分析中的应用[J].计算机工程与科学,2011,33(5):141-145. 被引量：6

计算机科学

2003年第8期

浏览历史

内容加载中请稍等...

数据集成中不一致性数据相似性比较的加权算法被引量：1

参考文献6

同被引文献5

引证文献1

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

数据集成中不一致性数据相似性比较的加权算法 被引量：1

参考文献6

同被引文献5

引证文献1

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

数据集成中不一致性数据相似性比较的加权算法被引量：1