XML文档相似性的仿真研究被引量：1

Simulation Research on XML Documents Similarity

下载PDF

导出

摘要 XML文档相似性的计算是XML文档分类中的一个难题。文中描述了一种基于结构的方法,通过序列化模式挖掘方法,挖掘出两个文档之间的最大相似路径,从而可以通过计算最大相似的路径的节点数目和所有路径的节点数目的比值,得到两个文档之间的相似度。文章提出了一种新的最小化XML文档的方法,并且综合考虑了文档节点的语义相似度和结构相似度,从而进一步地提高了计算文档相似度的精度。实验表明,该方法有着良好的应用前景。 Computing similarity between XML documents has been a big puzzle in documents classifying. This paper firstly proposes a model for computing XML documents similarity. Then it uses XMLGenerator to simulate implementing test. The paper describes a method based on structure, which uses sequential pattern mining approach to find out the maximal common paths in two XML document trees. Then we measure similarity as the ratio between maximal common paths and all paths extracted from XML document tree. A novel approach to minimize XML document is proposed and semantic similarity and structural similarity are both considered to improve similarity between two XML documents. There is a good future of our method.

作者陆翠明李芳 Athena I Vakali

机构地区上海交通大学计算机系亚里斯多德大学信息系

出处《计算机仿真》 CSCD 2005年第12期300-302,310,共4页 Computer Simulation

关键词扩展标识语言信息检索数据挖掘序列化模式挖掘 Extensible markup language （XML） Information retrieval Data mining Sequential pattern mining

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1Andrew Nierman,H V Jagadish.Evaluating Structural Similarity in XML Documents[C].Proceedings of the Fifth International Workshop on the Web and Databases,2002.61-66.
2Sergio Flesca,Giuseppe Manco,Elio Masciari,Luigi Pontieri and Andrea Pugliese.Detecting Structural Similarities between XML Documents[C].Proceedings of WebDB 2002.
3Jung-Won Lee,Kiho Lee,Won Kim.Preparations for Semantics-Based XML Mining[C].Proceedings of IEEE International Conference on Data Mining(ICDM 2001.345-352.
4Rakesh Agrawal,Ramakrishman Srikant.Mining Sequential Patterns[C].Proceedings of Eleventh International Conference on Data Engineering,1995.3-14.
5Jayant Madhavan,Philip A Bernstein,Erhard Rahm.Generic Schema Matching with Cupid[C].Proceedings of the 27th VLDBConference,2001.49-58.

同被引文献11

1王正群,陈世福,陈兆乾.基于模糊划分的神经网络集成[J].南京大学学报（自然科学版）,2006,42(1):63-68. 被引量：6
2潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量：16
3Yun C, Yi X, Yang Y R, et al. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (2): 190-202.
4Nierman A, Jagadish H V. Evaluating structural similarity in xml documents. Proceedings of the WebDB Workshop, USA: Madison, 2002 : 61-66.
5Chawathe S S. Comparing hierarchical data in external memory. Proceedings of the VLDB Conference, UK: Edinburgh, 1999: 90-101.
6Wang L,Cheung D W, Mamoulis N, et al. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Transactions on Knowledge and Data Engineering, 2004,16(1) :82-96.
7Francesca F D, Gordano G, Ortale R, et al. A general framework for XML document clustering. Technical Report, No. 8, ICAR-CNR (Consiglio Nazionale delle Ricerche Istituto di Calcoloe Reti ad Alte Prestazioni), 2003.
8Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. Proceedings of ICDE99 (International Conference on Data Engineering), Australia: Sydney, 1999, 512-521.
9Theodore D, Tao C, Klaas J W, et al. Clustering XML documents using structural summaries. Current Trends in Database Technology- EDBT 2004 Workshops. Springer Berlin/Heidelberg, 2004 : 547-556.
10Leung H P, Chung FL, Stephen C F C. On the use of hierarchical information in sequential mining-based XML document similarity computation. Knowledge and Information Systems, 2005, 7(4) :476-498.

引证文献1

1苗建新,吉根林.GML文档结构聚类算法Clu-GML[J].南京大学学报（自然科学版）,2008,44(2):188-194. 被引量：8

二级引证文献8

1张丽,吉根林.一种基于线面包含关系的GML空间聚类算法[J].山东大学学报（工学版）,2009,39(2):21-25. 被引量：3
2魏建香,刘怀,苏新宁.基于遗传算法的文档聚类算法的设计与仿真(英文)[J].南京大学学报（自然科学版）,2009,45(3):432-438. 被引量：4
3杨娜,吉根林.一种基于相交关系的GML空间聚类算法[J].广西师范大学学报（自然科学版）,2009,27(3):113-117. 被引量：3
4刘喜平,万常选.有效的XML检索结果的相似性度量[J].南京大学学报（自然科学版）,2009,45(5):629-637. 被引量：3
5朱颖雯,吉根林,孙勤红.基于频繁子树模式的GML文档结构聚类算法[J].计算机工程与应用,2011,47(1):144-146.
6宋爱琪,宋德香,刘晓红,王美君.基于空间相邻关系的GML点对象聚类算法研究[J].测绘标准化,2011,27(1):8-10.
7宋爱琪,刘晓红,吴国洋.GML时空聚类算法性能综述[J].测绘标准化,2011,27(4):9-11. 被引量：1
8兰小机,余红丽,戢武平,赵志岐.基于GML原理的GPS气象学预警研究[J].地球物理学进展,2012,27(4):1294-1297.

1杜新林,刘丹,董妍.XML文档相似性的常用方法比较[J].长春大学学报,2009,19(6):30-31. 被引量：3
2周相兵,谢成锦,兰青青.基于层次结构互用性框架的设计研究[J].计算机与数字工程,2007,35(10):46-49.
3邹光华,刘毅.基于XML的PLC数据描述[J].计算机工程,2006,32(1):113-115. 被引量：6
4刘毅.基于XML的嵌入式软PLC通用梯形图开发方案[J].中国科技信息,2009(15):114-116.
5张世龙,沈玉利.基于RBAC的SSO统一权限管理方法[J].计算机工程与设计,2009,30(9):2139-2141. 被引量：5
6唐远翔,刘益.关系模式到XML模式的转换研究[J].福建电脑,2010,26(10):24-25.
7刘丹,宁云隆,于聪梅.XML文档相似性的比较[J].中国科技博览,2009(11):64-64.
8雷志翔.一种基于无约束边替换的相似路径算法[J].信息与电脑,2016,28(11):82-83.
9何文孝,钟琪.基于XML的存储过程调用[J].内江师范学院学报,2005,20(6):28-30.
10赵震,任永昌.大数据时代电子政务中XML文档相似性[J].计算机技术与发展,2017,27(1):186-189. 被引量：1

计算机仿真

2005年第12期

浏览历史

内容加载中请稍等...

XML文档相似性的仿真研究被引量：1

参考文献5

同被引文献11

引证文献1

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

XML文档相似性的仿真研究 被引量：1

参考文献5

同被引文献11

引证文献1

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

XML文档相似性的仿真研究被引量：1