期刊文献+

考虑层数信息的XML文档聚类方法

Clustering XML documents by layer information
下载PDF
导出
摘要 提出了一种层数敏感的XML文档数据集聚类方法CXLI。首先提出结构表概念,消除XML文档的重复和嵌套结构。然后提出考虑层数信息的XML文档基本编辑操作约束。进一步给出考虑层数信息的XML文档间相似性度量方法。最后使用凝聚型层次聚类方法对XML文档数据集进行聚类。在ACM SIGMOD数据集和人工生成的数据集上进行了实验验证,结果表明:在计算时间基本相同的情况下,CXLI方法具有更好的精确度。 A layer-sensitive XML document collection clustering method CXLI is proposed in this paper. First, a concept of structural table is put forward to clear up the duplication structures in XML documents. Second, the constraints o{ editing operations are established. Third, a testing method of the similarity between XML documents is presented. Finally, the XML documents are clustered using agglomerative hierarchical clustering method. ACM SIMOD data set and synthetic data set are used to test the proposed method. Results show that the proposed CXLI has better precision under similar time cost.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2014年第1期124-128,共5页 Journal of Jilin University:Engineering and Technology Edition
基金 吉林省科技发展计划项目(20090704) 吉林省自然科学基金项目(201115020)
关键词 人工智能 数据挖掘 可扩展标记语言 相似性度量 聚类 层数 artificient intelligence data mining XML similarity detection clustering layer
  • 相关文献

参考文献15

  • 1Abiteboul S, Buneman P, Suciu D. Data on the Web [M]. San Francisco: Morgan Kaufmann, 2000.
  • 2Wilde E, Glushko R J. XML fever[J]. Communica tions of the ACM, 2008, 51(7): 40-46.
  • 3$elkow $ M. The tree to tree editing problem[J].Information Processing Letters, 1977,6 (6): 184- 186.
  • 4Zhang K, Shasha D. Simple fast algorithms for the editing distance between trees and related problems [J]. SIAM Journal on Computing, 1989, 18(6): 1245-1262.
  • 5Chawathe S. Comparing hierarchical data in external memory[C]//Proc of the 25th International Confer ence on Very Large Data Bases, San Francisco: Morgan Kaufmann, 1999:90 101.
  • 6Chawathe S, Rajaraman A, Garcia-Molina H, et al. Change detection in hierarchically structured infor- mation[C]//ACM SIGMOD International Confer ence on Management of Data, ACM: Montreal, Canada, 1996:493-504.
  • 7Nierman A, Jagadish H. Evaluating structural simi larity in XML documents[C]//Proc of the 5th Inter national Workshop on the Web and Databases, Wis consin: Madison, 2002:61-66.
  • 8Dalamagas T, Cheng T, Winkel K J, et al. A meth- odology for clustering XMI. documents by structure [J]. Information Systems, 2006,31(3): 187-228.
  • 9Flesea S, Manco G, Masciari E, et al. Fast detec- tion of XML structural similarity[J]. IEEE Trans actions on Knowledge and Data Engineering, 2005, 17(2) : 160-175.
  • 10Tekli J, Chbeir R, Yetongnon K. An overview on XMI. similarity: background, current trends and fu- ture directions[J]. Computer Science Review, 2009, 3(3) : 151-173.

二级参考文献12

  • 1Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. VLDB1994, Santiago,Chile, 1994.
  • 2Heikki Mannila, et al. Search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery,1997, 1(3): 241~258.
  • 3Jong Soo Park, et al. An effective Hash based algorithm for mining association rules. SIGMOD1995, San Jose, USA, 1995.
  • 4Sergey Brin, et al. Dynamic itemset counting and implication rules for market basket data. SIGMOD1997, Tucson, USA,1997.
  • 5Ramesh C. Agarwal, et al. Depth first generation of long patterns, KDD 2000, Boston, USA, 2000.
  • 6Ramesh C. Agarwal, et al. A tree projection algorithm for generation of frequent itemsets. J. of Parallel and Distributed Computing, 2001, 61(3): 350~371.
  • 7Jiawei Han, Jian Pei, Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD2000, Dallas, USA, 2000.
  • 8J. Pei, et al.. H-Mine: Hyper-structure mining of frequent patterns in large databases. ICDM'01, San Jose, CA, 2001.
  • 9Mike Perkowitz, Oren Etzioni. Adaptive sites: Automatically learning from user access patterns. WWW' 97, Santa Clara, 1997.
  • 10J. Pei, et al.. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. ICDE'01, Heidelberg, 2001.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部