期刊文献+

不完整数据聚类算法研究

Research on Clustering Algorithm of Incomplete Data
下载PDF
导出
摘要 服务类电子政务通过单向或互动的方式向社会和公民提供诸如天气、统计数据、道路交通等信息。这些服务类信息在数据采集过程中,不可避免地会出现各种各样的数据质量问题,数据的不完整性就是其中一种。数据的不完整性对后续的信息统计、挖掘都会造成严重影响。以不完整数据为研究对象,在分析了当前不完整数据聚类算法存在问题的基础上,提出一种基于KNN的不完整数据AP聚类算法。算法首先给出了连续数值型和分类型数据的相似性度量方法,然后利用AP聚类算法对数据集中的完整数据进行聚类,最后利用KNN思想将完整数据集中的吸引度矩阵和归属度矩阵扩展至整个数据集,继续执行迭代,直至收敛。实验将该算法同其他不完整数据聚类算法在聚类精度上进行比较,验证了提出的不完整数据聚类算法的有效性。 The E-government for service provides information such as weather,statistics and road traffic to society and citizens in one-way or interactive way.In the process of data collection,various data quality problems will inevitably occur for these service information,one of which is the data incompleteness which will have a serious impact on the information statistics and mining.Based on the analysis of the existing problems of incomplete data clustering algorithm,we propose a KNN-based AP clustering algorithm for incomplete data aiming at the incomplete data of service E-government.Firstly,the algorithm gives a similarity measure method for continuous numerical and categorical data.And then AP clustering algorithm is used to cluster the complete dataset.Finally,the responsibility and availability matrix of the complete dataset are extended to the whole dataset by using KNN,and the iteration is continued until convergence.In the experiment,the proposed algorithm is compared with other incomplete data clustering algorithms in clustering accuracy,verifying its effectiveness.
作者 冷泳林 孙晓红 LENG Yong-lin;SUN Xiao-hong(School of Information Science and Technology,Bohai University,Jinzhou 121000,China;School of Management,Bohai University,Jinzhou 121000,China)
出处 《计算机技术与发展》 2020年第8期61-65,72,共6页 Computer Technology and Development
基金 辽宁省社会科学基金(L14AGL002,L13AGL002) 辽宁省教育项目(LQ2017004)。
关键词 电子政务 不完整数据 聚类 相似度 KNN 数据离散 E-government incomplete data clustering similarity KNN data discretization
  • 相关文献

参考文献7

二级参考文献32

  • 1唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量:95
  • 2STEVENSWR.TCP/IP详解卷1:协议[M].北京:机械工业出版社,2000.
  • 3Richard Sharpe,Ed Warnicke.Ethereal User's Guide[EB/OL].[2010-03-20].http://www.ethereal.com/docs/eug_html.
  • 4KING CHRISTOPHER M, DALTON CURTIS E, OSMANOGLU ERTEM T.Security Architecture: Design,Deployment, and Operations[M]. USA:McGraw-Hill, 2001.
  • 5Melanie Hills. Intranet business strategies [M]. USA: John Wiley & Sons. Inc. ,1997.
  • 6于海峰.政府信息化建设中的信息安全管理[J].计算机世界,2001,.
  • 7Cui Liying,Soundar K,Reka A.Complex Networks:An Engineering View[J].IEEE Circuits and Systems Magazine,2010,10(3):10-25.
  • 8Jerez J M,Molina I,García-Laencina P J,et al.Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem[J].Artificial Intelligence in Medicine,2010,50(2):105-115.
  • 9Twala B,Jones M C,Hand D J.Good Methods for Coping with Missing Data in Decision Trees[J].Pattern Recognition Letters,2008,29(7):950-956.
  • 10Frey B J,Dueck D.Clustering by Passing Messages Between Data Points[J].Science,2007,315(5814):972-976.

共引文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部