期刊文献+

基于张量的XML相似度计算方法 被引量:2

Tensor-based approach to XML similarity calculation
原文传递
导出
摘要 扩展标记语言(XML)带有一定的结构和语义信息,与普通文本相比,XML具有描述精确、表现形式丰富等特点,但同时也使得传统的自然语言处理和数据挖掘等技术不能直接应用.根据XML内容和结构并非独立,内容影响结构,结构作用于内容,提出一种基于张量的XML特征降维及综合相似度计算方法.针对XML文档,使用张量表示并采用基于最大互信息的方法对其进行降维,采用将XML结构和内容相融合的综合相似度度量方法确定结构和内容的内在联系及共同作用方式,提高XML综合相似度计算性能.实验及结果分析验证了所提出方法的有效性. XML documents have both structural and semantic information, bringing data integration and deeply utilization based on XML more precise description and versatile expression, but meanwhile traditional natural language processing(NLP) and data mining(DM) methods can not be applied directly. Feature dimension reduction and general similarity of XML based on tensor analysis are discussed. Considering the correlation between XML's structure and content,a tensor based method of describing XML documents and a maximization mutual information(MMI) method of XML's dimension reduction are presented. Since the structure and the content are not independent each other, a tensor based algorithm of calculating general similarity from a non-linear angle is designed to show their relationships and effects, which can improve the calculated performance for the general similarity of XML. The experimental results show the effectiveness of the proposed method.
出处 《控制与决策》 EI CSCD 北大核心 2016年第9期1711-1714,共4页 Control and Decision
基金 国家自然科学基金项目(61370144)
关键词 扩展标记语言 综合相似度 张量分析 特征降维 XML general similarity tensor analysis feature reduction
  • 相关文献

参考文献13

  • 1Omidvar, Amin, Mehdi Garakani, et al. Context baseduser ranking in forums for expert finding using Word Netdictionary and social network analysis[J]. InformationTechnology and Management, 2014, 15(1): 51-63.
  • 2A¨?telhadj A, Boughanem M, Mezghiche M. Usingstructural similarity for clustering XML documents[J].Knowledge and Information Systems, 2012, 32(1): 109-139.
  • 3王桐,刘大昕.一种新的混合XML文档聚类方法[J].哈尔滨工程大学学报,2007,28(6):697-701. 被引量:7
  • 4Helmer S, Augsten N, B ¨ohlen M. Measuring structuralsimilarity of semi-structured data based on informationtheoretic approaches[J]. The VLDB J, 2012, 21(5): 677-702.
  • 5Guo Yongming, Chen Dehua, Le Jiagin. Clustering XMLdocuments by combining content and structure[C]. IntSymposium on Information Science and Engineering.Shanghai: IEEE Computer Society, 2008: 583-587.
  • 6Tran Tien, Nayak Richi. A progressive clustering algorithmto group the XML data by structural and semanticsimilarity[J]. Int J of Pattern Recognition and ArtificialIntelligence, 2007, 21(4): 1-23.
  • 7Madani Amina, Omar Boussaid, Djamel Eddine Zegour.Semi-structured documents mining: A review andcomparison[J]. Procedia Computer Science, 2013,22(2013): 330-339.
  • 8Yoon J, Raghavan V, Kerschberg L. Bitcube: Clusteringand statistical analysis for xml documents[C]. The 13th IntConf on Scientific and Statistical Database Management.Virginia: Fairfax, 2001: 158-167.
  • 9Nadine, Salah Bourennane. Dimensionality reductionbased on tensor modelling for classification methods[J].IEEE Trans on Geoscience and Remote Sensing, 2009,47(4): 1123-1131.
  • 10Leiva Murillo J M, Artes A Rodriguez. Maximizationof mutual information for supervised linear featureextraction[J]. IEEE Trans on Neural Networks, 2007,18(5): 1433-1441.

二级参考文献10

  • 1COSTA G,MANCO G,ORTALE R,et al.A tree-based approach to clustering XML documents by structure[A].In Proc PKDD[C].Pisa,Italy,2004.
  • 2ANDREW N,JAGADISH H.Evaluating similarity in XML documents[A].In Proc 5th Int'l Workshop Web and Databases[C].Madison,USA,2002.
  • 3ZHANG K,SHASHA D.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM J Comput,1989,18(6):1245-1262.
  • 4FLESCA S,MANCO G,MASCIARI E,et al.Detecting structural similarities between XML document[A].In Proc 5th Int'l Workshop Web and Databases[C].Madison,USA,2002.
  • 5GEORGE M,RICHARD B.Introduction to wordNet:an on-line lexical data-base[J].International Journal of Lexicography,1993,3(4):.235-312.
  • 6KCNNCDY J,EBERHART RC.Particle swarm optimization[A].In Proc the IEEE International Joint ConScrence on Neural Networks[C].Orland,USA,1995.
  • 7LEY M.DBLP computer science bibliography[EB/OL].http://www.informatik.uni-trier.de/~ ley/db/,2004-05-10.
  • 8Georgetown Protein Information Resource.Protein sequence database[EB/OL].http://pir.georgetown.edu,2001-07-11.
  • 9SHI Y H,EBERHART R.Parameter selection in particle swarm optimization[A].In Proc 7th Annual Conference on Evolutionary Programming[C].San Diego,USA,1998.
  • 10WANG T,LIU D X,SUN W.An effective XML filtering method for high-performance publish/subscribe system[A].Workshop on Web-based Internet Computing for Science and Engineering,In conjunction with APWeb2006[C].Harbin,China,2006.

共引文献6

同被引文献11

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部