摘要
服务类电子政务通过单向或互动的方式向社会和公民提供诸如天气、统计数据、道路交通等信息。这些服务类信息在数据采集过程中,不可避免地会出现各种各样的数据质量问题,数据的不完整性就是其中一种。数据的不完整性对后续的信息统计、挖掘都会造成严重影响。以不完整数据为研究对象,在分析了当前不完整数据聚类算法存在问题的基础上,提出一种基于KNN的不完整数据AP聚类算法。算法首先给出了连续数值型和分类型数据的相似性度量方法,然后利用AP聚类算法对数据集中的完整数据进行聚类,最后利用KNN思想将完整数据集中的吸引度矩阵和归属度矩阵扩展至整个数据集,继续执行迭代,直至收敛。实验将该算法同其他不完整数据聚类算法在聚类精度上进行比较,验证了提出的不完整数据聚类算法的有效性。
The E-government for service provides information such as weather,statistics and road traffic to society and citizens in one-way or interactive way.In the process of data collection,various data quality problems will inevitably occur for these service information,one of which is the data incompleteness which will have a serious impact on the information statistics and mining.Based on the analysis of the existing problems of incomplete data clustering algorithm,we propose a KNN-based AP clustering algorithm for incomplete data aiming at the incomplete data of service E-government.Firstly,the algorithm gives a similarity measure method for continuous numerical and categorical data.And then AP clustering algorithm is used to cluster the complete dataset.Finally,the responsibility and availability matrix of the complete dataset are extended to the whole dataset by using KNN,and the iteration is continued until convergence.In the experiment,the proposed algorithm is compared with other incomplete data clustering algorithms in clustering accuracy,verifying its effectiveness.
作者
冷泳林
孙晓红
LENG Yong-lin;SUN Xiao-hong(School of Information Science and Technology,Bohai University,Jinzhou 121000,China;School of Management,Bohai University,Jinzhou 121000,China)
出处
《计算机技术与发展》
2020年第8期61-65,72,共6页
Computer Technology and Development
基金
辽宁省社会科学基金(L14AGL002,L13AGL002)
辽宁省教育项目(LQ2017004)。
关键词
电子政务
不完整数据
聚类
相似度
KNN
数据离散
E-government
incomplete data
clustering
similarity
KNN
data discretization