期刊文献+

XML文档相似性的仿真研究 被引量:1

Simulation Research on XML Documents Similarity
下载PDF
导出
摘要 XML文档相似性的计算是XML文档分类中的一个难题。文中描述了一种基于结构的方法,通过序列化模式挖掘方法,挖掘出两个文档之间的最大相似路径,从而可以通过计算最大相似的路径的节点数目和所有路径的节点数目的比值,得到两个文档之间的相似度。文章提出了一种新的最小化XML文档的方法,并且综合考虑了文档节点的语义相似度和结构相似度,从而进一步地提高了计算文档相似度的精度。实验表明,该方法有着良好的应用前景。 Computing similarity between XML documents has been a big puzzle in documents classifying. This paper firstly proposes a model for computing XML documents similarity. Then it uses XMLGenerator to simulate implementing test. The paper describes a method based on structure, which uses sequential pattern mining approach to find out the maximal common paths in two XML document trees. Then we measure similarity as the ratio between maximal common paths and all paths extracted from XML document tree. A novel approach to minimize XML document is proposed and semantic similarity and structural similarity are both considered to improve similarity between two XML documents. There is a good future of our method.
出处 《计算机仿真》 CSCD 2005年第12期300-302,310,共4页 Computer Simulation
关键词 扩展标识语言 信息检索 数据挖掘 序列化模式挖掘 Extensible markup language (XML) Information retrieval Data mining Sequential pattern mining
  • 相关文献

参考文献5

  • 1Andrew Nierman,H V Jagadish.Evaluating Structural Similarity in XML Documents[C].Proceedings of the Fifth International Workshop on the Web and Databases,2002.61-66.
  • 2Sergio Flesca,Giuseppe Manco,Elio Masciari,Luigi Pontieri and Andrea Pugliese.Detecting Structural Similarities between XML Documents[C].Proceedings of WebDB 2002.
  • 3Jung-Won Lee,Kiho Lee,Won Kim.Preparations for Semantics-Based XML Mining[C].Proceedings of IEEE International Conference on Data Mining(ICDM 2001.345-352.
  • 4Rakesh Agrawal,Ramakrishman Srikant.Mining Sequential Patterns[C].Proceedings of Eleventh International Conference on Data Engineering,1995.3-14.
  • 5Jayant Madhavan,Philip A Bernstein,Erhard Rahm.Generic Schema Matching with Cupid[C].Proceedings of the 27th VLDBConference,2001.49-58.

同被引文献11

  • 1王正群,陈世福,陈兆乾.基于模糊划分的神经网络集成[J].南京大学学报(自然科学版),2006,42(1):63-68. 被引量:6
  • 2潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 3Yun C, Yi X, Yang Y R, et al. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (2): 190-202.
  • 4Nierman A, Jagadish H V. Evaluating structural similarity in xml documents. Proceedings of the WebDB Workshop, USA: Madison, 2002 : 61-66.
  • 5Chawathe S S. Comparing hierarchical data in external memory. Proceedings of the VLDB Conference, UK: Edinburgh, 1999: 90-101.
  • 6Wang L,Cheung D W, Mamoulis N, et al. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Transactions on Knowledge and Data Engineering, 2004,16(1) :82-96.
  • 7Francesca F D, Gordano G, Ortale R, et al. A general framework for XML document clustering. Technical Report, No. 8, ICAR-CNR (Consiglio Nazionale delle Ricerche Istituto di Calcoloe Reti ad Alte Prestazioni), 2003.
  • 8Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. Proceedings of ICDE99 (International Conference on Data Engineering), Australia: Sydney, 1999, 512-521.
  • 9Theodore D, Tao C, Klaas J W, et al. Clustering XML documents using structural summaries. Current Trends in Database Technology- EDBT 2004 Workshops. Springer Berlin/Heidelberg, 2004 : 547-556.
  • 10Leung H P, Chung FL, Stephen C F C. On the use of hierarchical information in sequential mining-based XML document similarity computation. Knowledge and Information Systems, 2005, 7(4) :476-498.

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部