期刊文献+

XML文档的聚类研究

Clustering Research on XML Document
下载PDF
导出
摘要 随着互联网的迅速发展,XML已经成为互联网中最常用的数据交换与存储语言,如何从大量的XML文档中提取有价值的信息是目前的研究热点之一.本文提出了一种基于SET/BAG模型的改进的相似度计算方法.该方法将XML文档的每个节点转换成一个对象(由对象名、父对象、属性集合以及该对象相对于其父对象的权重组成),能较完整地表达XML文档的结构信息,并且通过调整重复节点的权重来降低其在相似度计算中的影响.在真实数据集与人工数据集上分别进行实验,仿真实验结果表明,本文提出的基于SET/BAG模型下改进的相似度计算方法能得到很好的聚类结果. With the rapid development of Internet,XML has become the most commonly used language for the Internet data exchange and storage. How to extract valuable information from a large number of XML document is one of the hottest research topics currently. This paper proposes a model based on the SET / BAG improved similarity calculation method,which converts each node of the XML document to an object( the object name,object,attribute set,and the weight of the object relative to the parent object) and can fully express the structure of an XML document information,by adjusting the repeated node weights to reduce its influence in similarity calculation.Based on real data sets and artificial datasets experiments respectively,the simulation experimental results show that the proposed method in this paper based on the SET / BAG model improved similarity calculation can get good clustering results.
作者 尹路修
出处 《湖南师范大学自然科学学报》 CAS 北大核心 2015年第5期91-94,共4页 Journal of Natural Science of Hunan Normal University
关键词 XML 文档聚类 相似度计算 XML document clustering similarity computation
  • 相关文献

参考文献8

  • 1ALSAYED A, MARCO M, RICHI N, et al. XML data clustering:an overview[J]. ACM Comput Surv, 2011,43(4) :25.
  • 2ANAND R, JEFFREY D U. Mining of Massive Datasets[ M ]. Cambridge: Cambridge University Press, 2011.
  • 3周水庚,周傲英,曹晶,胡运发.一种基于密度的快速聚类算法[J].计算机研究与发展,2000,37(11):1287-1292. 被引量:89
  • 4BERTINO E, GUERRINI G, MESITI M. Measuring the structural similarity among XML documents and DTDS[ EB/CD]. Tech- nical Report DISI-TR-02-02, Department of Computer Science, University of Genova, 2002.
  • 5FLESCA S, MANEO G, MASCIARI E, et al. Detecting structural similarities between XML documents[ C]//Proceedings of the 5th International Workshop on the Web and Databases, Madison, Wisconsin, 2002:55-60.
  • 6SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Comm ACM, 1975,18(11) :613-620.
  • 7LEE J W, LEE K, KIM W. Preparations for semantics-based XML mining[ C ]//Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, 2001:345-352.
  • 8ANDREA T, SERGIO G. Semantic clustering of XML documents[ J ]. ACM Trans Inform Syst, 2010,28 (1) :1-56.

二级参考文献4

  • 1Zhang W,Proc 23rd VL DB Conf,1997年,186页
  • 2Chen M S,IEEE Trans Knowledge Data Engineering,1996年,8卷,6期,866页
  • 3Zhang T,Proc ACM SIGMOD Int Conf on Management of Data,1996年,73页
  • 4Ng R T,Proc 20th VLDB Conf,1994年,144页

共引文献88

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部