期刊文献+

不确定树数据库中的动态聚类算法 被引量:4

Dynamic Clustering Algorithm in Uncertain Tree Database
下载PDF
导出
摘要 针对现有的树聚类算法不能适应数据的动态变化和不确定性等问题,研究不确定数据的聚类问题,提出一种在不确定树数据库中的动态聚类算法,有效地解决了因数据的动态变化而导致的无法聚类的问题.首先,提出转变树集、相似分组和树类集等概念来描述一个不确定树数据库的聚类模型.其次,为了更加准确的度量子树之间的相似性,考虑到子树即具有结点语义特征,又具有结构化特性,提出了一种语义相似度计算方法与结构相似度计算方法,同时对两者赋予一定比例的权值并求和得到最终的相似度.再次,设计了一个动态聚类过程,采用自适应获取聚类阈值,较大程度上减少了人为干扰导致聚类结果不准确的影响,使得具有相似结构的子树聚集在同一个相似分组中,不同分组之间的子树相似度达到最小化,同时对每个相似分组,定义一个提取代表性子树的公式,将其作为树类组成树的类集.最后,通过模拟数据和真实环境两部分实验可以表明,算法有效可行,聚类结果较准确且具有较好的运行效率. Considering the dis - applicability to dynamic variation, uncertainty and other problems of present tree clustering algorithm, the research on uncertain data clustering and proposal of a dynamic algorithm in uncertain tree database have effectively investigated the clustering problems result from dynamic database. First, the cluster mode of an uncertain tree database is described by introduction of conceptions of tree set change, similar group and tree class set. Second, in order to do accurate measurement on the similarities a- mong subtrees, the calculation method of semantic similarity and structural similarity are proposed for subtree's node semantic charac- teristic and structured characteristic. In addition, proper weight is distributed to both similarities and accumulated to evaluate the final similarities. Third, a dynamic clustering process is designed in which threshold can be captured self - adaptively so that greatly reduce the jamming impact to the result accuracy. This process can cluster subtrees of similar structure within similar groups , which can minimize the similarity of subtree groups, and define a formula to single out the representatives in groups and qualify the representa- fives as tree classes which can be combined as tree class set. In the end, through experiment by analog data and reality, it turns out that the algorithm is effective and feasible. The clustering result is accurate and can run efficiently.
出处 《小型微型计算机系统》 CSCD 北大核心 2013年第6期1339-1343,共5页 Journal of Chinese Computer Systems
基金 湖南省教育厅科学研究项目(12CD291 11C1051)资助 吉首大学校级科研计划项目(11JD051)资助
关键词 数据挖掘 有序树 频繁子树 相似度 不确定树 聚类 data mining ordered tree frequent subtree similarity uncertain tree cluster
  • 相关文献

参考文献7

二级参考文献154

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量:18
  • 3谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 4赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量:14
  • 5杜世宏,秦其明,王桥.空间关系及其应用[J].地学前缘,2006,13(3):69-80. 被引量:24
  • 6钱晓东.数据挖掘中分类方法综述[J].图书情报工作,2007,51(3):68-71. 被引量:28
  • 7Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 8Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 9Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 10Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188

共引文献267

同被引文献131

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部