期刊文献+

基于频繁子树模式的GML文档结构聚类算法

GML document structural clustering algorithm based on frequent subtree patterns
下载PDF
导出
摘要 提出了一种基于频繁子树模式的GML文档结构聚类算法GCFS(GML Clustering based on Frequent Subtree patterns),与其他相关算法不同,该算法首先挖掘GML文档集合中的最大与闭合频繁Induced子树,并将其作为聚类特征,根据频繁子树的大小赋予不同的权值,采用余弦函数定义相似度,利用K-Means算法对聚类特征进行聚类。实验结果表明算法GCFS是有效的,具有较高的聚类效率,性能优于其他同类算法。 This paper presents algorithm GCFS for clustering GML document structure based on frequent subtree patterns.It firstly mines all maximal and closed frequent Induced subtrees from GML documents;then chooses some subtree patterns to form the clustering features,weights these features according to the length of subtree pattern,computes the similarity of two GML documents by cosine function,uses K-Means algorithm to cluster documents by clustering features.Experiment results show that GCFS is effective and efficient.Its performance is superior to other GML clustering algorithms.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第1期144-146,149,共4页 Computer Engineering and Applications
基金 国家自然科学基金No.40871176~~
关键词 地理标识语言(GML)结构聚类 最大频繁Induced子树 闭合频繁Induced子树 Geography Markup Language(GML) clustering by structure maximal frequent Induced subtrees closed frequent Induced subtrees
  • 相关文献

参考文献14

  • 1Guillaume D, Murtagh F.Clustering of XML documents[J].Computer Physics Communications, 2000,127(2/3) : 215-227.
  • 2Doucet A,Ahonen-Myka H.Naive clustering of a large xml document eollection[C]//Proceedings of the Ist Annual Workshop of the Initiative for the Evaluation of XML ketrieval (INEX), Germany, 2002: 81-88.
  • 3Nierman A, Jagadish H V.Evaluating structural similarity in XML doeuments[C]//Proceedings of the 5th International Workshop on the Web and Database(WebDB), Madison, 2002: 61-66.
  • 4Zhang K, Shasha D.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM Journal on Computing, 1989,18(6) : 1245-1262.
  • 5Wang L, Cheung D W,Mamoulis N, et al.An efficient and sealable algorithm for clustering XML documents by structure[J]. IEEE TKI)E, 2004,16 ( 1 ) : 82-96.
  • 6Leung H P,Chung K F L,Chan S C F.On the use of hierarchical information in sequential mining-based XML document similarity computation[J].Knowledge and Information Systems,2005, 7(4) :476-498.
  • 7Leung H P,Chung K F L,Chan S C F.XML document clustering using common Xpath[C]//2005 International Workshop on Challenges in Web Information Retrieval and Integration(WIRI 2005 ), Tokyo, 2005 : 91-96.
  • 8Nayak R,Xu S.XCLS:A fast and effective clustering algorithm for heterogenous XML documents[C]//The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAK- DD), Singapore, 2006.
  • 9Chehredhani M H,Rahgozar M,Lucas C,et al.Clustering rooted ordered trees[C]//Proceedings of the 2007 International Conference on Computational Intelligence and Data Mining (CI- DM2007), 2007: 450-455.
  • 10Dalamaga T, Cheng S T, Winkel K J, et al.Clustering XML documents using structural summaries[C]//Proceedings of the EDBT Workshop on Clustering Information over the Web, 2004: 547-556.

二级参考文献22

  • 1陆翠明,李芳,Athena I Vakali.XML文档相似性的仿真研究[J].计算机仿真,2005,22(12):300-302. 被引量:1
  • 2王正群,陈世福,陈兆乾.基于模糊划分的神经网络集成[J].南京大学学报(自然科学版),2006,42(1):63-68. 被引量:6
  • 3潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 4Bray T,Paoli J,Sperberg-McQueen C M,et al.Extensible Markup Language (XML) 10 (Fc,urth Edition ) [EB/OL]. ( 2006 -09 -29 ).http:// www.w3.org/TR/2OO6/R EC-xml-20060816/.
  • 5Leung Ho-pong,Chung Fu-lai,Chan C F,et al.XML document clustering using common xpath[C]//Proceedings of the 2005 International Workshop on Challenges in Web Information Retrieval and Integration(WIRI '05 ), 2005.
  • 6Rafiei D.Finding syntactic similarities between XML documents[C]// Proc 17th Int Conf on Database and Expert Systems Applications ( DEXA' 06 ), 2006.
  • 7Lee Jung-Won,Lee Kiho,Kim Won.Preparations for semantics- based XML mining[C]//Proceedings of the 2001 IEEE International Conference on Data Mining(ICDM'01),2001.
  • 8Leung H P.On the use of hierarchical information in sequential mining-based XML document similarity computation[J].Knowledge and Information Systems,2005,7(4).
  • 9Nierman A,Jagadish H V.Evaluating structural similarity in XML documents[C]//Int'l Workshop on the Web and Databases(Web- DB).Madison, 2002: 61-66.
  • 10Lee Mong Li,Yang Liang Huai,Hsu Wynne,et al.XClust: clustering XML schemas for effective integration[C]//CIKM'02,November 4-9,2002.

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部