期刊文献+

基于区间数的多维不确定性数据UID-DBSCAN聚类算法 被引量:3

UID-DBSCAN Clustering Algorithm of Multi-dimensional Uncertain Data Based on Interval Number
下载PDF
导出
摘要 不确定性数据聚类方法的研究日益受到广泛关注,其中UIDK-means算法与U-PAM算法继承了基于划分算法无法识别任意形状簇和对噪声点敏感的缺陷。FDBSCAN算法事先假定不确定性数据的概率分布函数或概率密度函数是已知的,然而这些信息在实际应用中往往难以获取。针对上述算法的不足,提出一种基于区间数的多维不确定性数据聚类UID-DBSCAN算法。该算法利用区间数结合数据的统计信息合理地表示不确定性数据,采用低计算复杂度的区间数距离函数衡量不确定性数据对象间的相似度,首次提出区间数的密度、密度可达与密度相连等概念,并将其用于扩展簇中,同时结合数据集的统计特征自适应地选取算法的密度参数来实现自动聚类。实验结果表明,UID-DBSCAN算法能够有效识别噪声,处理任意形状簇,具有较高的聚类精度和较低的计算复杂度。 The researches on clustering methods of uncertain data have been paid more and more attention,among them,the UIDK-means algorithm and U-PAM algorithm inherit the partition-based algorithm defects that can not identify any shape clusters and is sensitive to noise.FDBSCAN algorithm assumes that the probability distribution function or probability density function of uncertain data is known,however this information is hard to acquire.For the shortage of the above algorithms,a new multi-dimensional uncertain data clustering algorithm namely UID-DBSCAN based on interval numbers was proposed.It uses interval data combined with statistic information to describe uncertain data reasonably.And it utilizes the intervals distance function of low computing complexity to measure the similarity of different uncertain data.The concepts of interval density,interval density-reachable and interval density connected were firstly proposed and applied to expand clusters.Meanwhile in order to realize automatic clustering,combining with statistical features of the data,the parameters of density can be adaptively selected.Experiment results show that UID-DBSCAN algorithm can identify noise effectively,process arbitrary shape clusters and obtain better clustering precision with low computing complexity.
出处 《计算机科学》 CSCD 北大核心 2017年第B11期442-447,共6页 Computer Science
基金 水利部公益性行业科研专项(201401044)资助
关键词 不确定性数据 区间数 聚类算法 DBSCAN Uncertain data Interval number Clustering algorithm DBSCAN
  • 相关文献

参考文献7

二级参考文献144

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2余仕成.大学物理实验数据处理的几个问题讨论[J].武汉化工学院学报,2005,27(1):94-96. 被引量:9
  • 3谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 4高迎,程涛远,王珊.基于Hilbert曲线的许可证存储策略及查找算法[J].软件学报,2006,17(2):305-314. 被引量:20
  • 5李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 6Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 7Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 8Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 9Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 10Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81

共引文献1301

同被引文献28

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部