基于张量的XML相似度计算方法被引量：2

Tensor-based approach to XML similarity calculation

导出

摘要扩展标记语言(XML)带有一定的结构和语义信息,与普通文本相比,XML具有描述精确、表现形式丰富等特点,但同时也使得传统的自然语言处理和数据挖掘等技术不能直接应用.根据XML内容和结构并非独立,内容影响结构,结构作用于内容,提出一种基于张量的XML特征降维及综合相似度计算方法.针对XML文档,使用张量表示并采用基于最大互信息的方法对其进行降维,采用将XML结构和内容相融合的综合相似度度量方法确定结构和内容的内在联系及共同作用方式,提高XML综合相似度计算性能.实验及结果分析验证了所提出方法的有效性. XML documents have both structural and semantic information, bringing data integration and deeply utilization based on XML more precise description and versatile expression, but meanwhile traditional natural language processing（NLP） and data mining（DM） methods can not be applied directly. Feature dimension reduction and general similarity of XML based on tensor analysis are discussed. Considering the correlation between XML＇s structure and content,a tensor based method of describing XML documents and a maximization mutual information（MMI） method of XML＇s dimension reduction are presented. Since the structure and the content are not independent each other, a tensor based algorithm of calculating general similarity from a non-linear angle is designed to show their relationships and effects, which can improve the calculated performance for the general similarity of XML. The experimental results show the effectiveness of the proposed method.

作者朴勇江贺王秀坤

机构地区大连理工大学软件学院

出处《控制与决策》 EI CSCD 北大核心 2016年第9期1711-1714,共4页 Control and Decision

基金国家自然科学基金项目(61370144)

关键词扩展标记语言综合相似度张量分析特征降维 XML general similarity tensor analysis feature reduction

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Omidvar, Amin, Mehdi Garakani, et al. Context baseduser ranking in forums for expert finding using Word Netdictionary and social network analysis[J]. InformationTechnology and Management, 2014, 15(1): 51-63.
2A¨?telhadj A, Boughanem M, Mezghiche M. Usingstructural similarity for clustering XML documents[J].Knowledge and Information Systems, 2012, 32(1): 109-139.
3王桐,刘大昕.一种新的混合XML文档聚类方法[J].哈尔滨工程大学学报,2007,28(6):697-701. 被引量：7
4Helmer S, Augsten N, B ¨ohlen M. Measuring structuralsimilarity of semi-structured data based on informationtheoretic approaches[J]. The VLDB J, 2012, 21(5): 677-702.
5Guo Yongming, Chen Dehua, Le Jiagin. Clustering XMLdocuments by combining content and structure[C]. IntSymposium on Information Science and Engineering.Shanghai: IEEE Computer Society, 2008: 583-587.
6Tran Tien, Nayak Richi. A progressive clustering algorithmto group the XML data by structural and semanticsimilarity[J]. Int J of Pattern Recognition and ArtificialIntelligence, 2007, 21(4): 1-23.
7Madani Amina, Omar Boussaid, Djamel Eddine Zegour.Semi-structured documents mining: A review andcomparison[J]. Procedia Computer Science, 2013,22(2013): 330-339.
8Yoon J, Raghavan V, Kerschberg L. Bitcube: Clusteringand statistical analysis for xml documents[C]. The 13th IntConf on Scientific and Statistical Database Management.Virginia: Fairfax, 2001: 158-167.
9Nadine, Salah Bourennane. Dimensionality reductionbased on tensor modelling for classification methods[J].IEEE Trans on Geoscience and Remote Sensing, 2009,47(4): 1123-1131.
10Leiva Murillo J M, Artes A Rodriguez. Maximizationof mutual information for supervised linear featureextraction[J]. IEEE Trans on Neural Networks, 2007,18(5): 1433-1441.

二级参考文献10

1COSTA G,MANCO G,ORTALE R,et al.A tree-based approach to clustering XML documents by structure[A].In Proc PKDD[C].Pisa,Italy,2004.
2ANDREW N,JAGADISH H.Evaluating similarity in XML documents[A].In Proc 5th Int'l Workshop Web and Databases[C].Madison,USA,2002.
3ZHANG K,SHASHA D.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM J Comput,1989,18(6):1245-1262.
4FLESCA S,MANCO G,MASCIARI E,et al.Detecting structural similarities between XML document[A].In Proc 5th Int'l Workshop Web and Databases[C].Madison,USA,2002.
5GEORGE M,RICHARD B.Introduction to wordNet:an on-line lexical data-base[J].International Journal of Lexicography,1993,3(4):.235-312.
6KCNNCDY J,EBERHART RC.Particle swarm optimization[A].In Proc the IEEE International Joint ConScrence on Neural Networks[C].Orland,USA,1995.
7LEY M.DBLP computer science bibliography[EB/OL].http://www.informatik.uni-trier.de/～ ley/db/,2004-05-10.
8Georgetown Protein Information Resource.Protein sequence database[EB/OL].http://pir.georgetown.edu,2001-07-11.
9SHI Y H,EBERHART R.Parameter selection in particle swarm optimization[A].In Proc 7th Annual Conference on Evolutionary Programming[C].San Diego,USA,1998.
10WANG T,LIU D X,SUN W.An effective XML filtering method for high-performance publish/subscribe system[A].Workshop on Web-based Internet Computing for Science and Engineering,In conjunction with APWeb2006[C].Harbin,China,2006.

共引文献6

1刘波,杨路明,邓云龙.向量矩阵迭代自组织XML辅助聚类算法[J].系统工程与电子技术,2008,30(12):2488-2492.
2刘波,杨路明,邓云龙.自适应的混沌粒子群算法优化XML文档聚类策略[J].系统仿真学报,2009,21(3):716-720. 被引量：3
3王桐,赵春晖,陆军.基于云计算思想和HXFA机的主动Web服务研究[J].电信科学,2010,26(10):25-29. 被引量：1
4王桐,赵春晖,焉晓贞.基于PML及Hedge的物联网异构信息集成处理模型[J].东南大学学报（自然科学版）,2011,41(2):301-304. 被引量：6
5郝秦霞.基于物联网的数字化矿山异构信息集成处理模型[J].西安科技大学学报,2015,35(1):132-136. 被引量：8
6王成勇,杜庆伟,孙静,孙振.基于特征偏好的XML文档聚类算法[J].计算机工程与应用,2016,52(12):64-68.

同被引文献11

1Ting Jia,Yuxia Yang,Xi Lu,Qiang Zhu,Kuo Yang,Xuezhong Zhou.Link Prediction based on Tensor Decomposition for the Knowledge Graph of COVID-19 Antiviral Drug[J].Data Intelligence,2022,4(1):134-148. 被引量：1
2沈超莉,何明昌,郭雅雯.基于文件系统的流媒体教学视频播放平台设计[J].实验室研究与探索,2012,31(2):138-140. 被引量：3
3李锐,张伯虎.视频瞄准系统中“+”字分划线叠加的设计[J].电子技术应用,2013,39(8):23-25. 被引量：1
4覃斌毅,邱杰,郑金存,董积有.基于LXI总线的Web关键技术研究与实现[J].电子技术应用,2016,42(3):74-77. 被引量：2
5曾帅,高宗彬,赵国锋.基于Tilera众核平台的流媒体流量发生系统的设计[J].电子技术应用,2016,42(4):56-59. 被引量：3
6李强,史志强,邵长锋.面向个性化定制的云制造服务平台的研发[J].电子技术应用,2016,42(5):109-112. 被引量：7
7罗敏珂,周益民,钟敏,朱策.AVS2视频编码码率控制算法[J].系统工程与电子技术,2016,38(9):2192-2200. 被引量：4
8陈树芳,李娟,郭新鹏,刘丽梅.基于RFID与智能终端的电梯维保系统研究与设计[J].电子技术应用,2016,42(12):105-107. 被引量：11
9张浩,雷洪,龚成斌,彭敬东,何显达,马学兵.基于数据挖掘的虚拟仿真实验教学量化分析[J].实验室研究与探索,2017,36(9):129-131. 被引量：8
10苑红星,卓雪雪,竺德,刘辉.基于矩阵的混合型邻域决策粗糙集增量式更新算法[J].控制与决策,2022,37(6):1621-1631. 被引量：10

引证文献2

1解书凯,赵红军,李莉娟.基于能耗优化的AVS编解码自适应流媒体系统设计[J].实验室研究与探索,2018,37(8):57-60.
2渠超洋,韩建军.一种基于模特征的增量式张量Tucker分解方法[J].控制与决策,2024,39(7):2431-2437.

1孙波,刘永娜,罗继鸿,张迪,张树玲,陈玖冰.基于张量分析的表情特征提取[J].计算机工程与应用,2016,52(20):145-148. 被引量：6
2赵宁宁,梁意文.综合结构和内容的XML文档相似度计算方法[J].微电子学与计算机,2016,33(4):69-72. 被引量：4
3杨磊,李臣龙.基于动态网络的链接分析与预测研究[J].安徽科技学院学报,2016,30(5):62-66. 被引量：1
4雷庆,吴扬扬.识别和抽取XM L文档中的关系信息及其出现模式[J].清华大学学报（自然科学版）,2005,45(S1):1757-1761. 被引量：3
5王学峰,庞旭,郭芮.静态应变式测力传感器设计方法[J].工程与试验,2016,56(2):1-3. 被引量：1
6陈昭,王斌,张立明.基于低秩张量分析的高光谱图像降维与分类[J].红外与毫米波学报,2013,32(6):569-575. 被引量：4
7雷庆,吴扬扬,缑锦.从复杂XML文档中抽取目标关系片段的方法[J].郑州大学学报（理学版）,2009,41(1):40-43.
8连仁明,王剑钢.基于结构张量分析的三维数据骨架结构提取方法[J].中国电子商情（科技创新）,2014(7):165-166.
9朱凤梅,张道强.张量图像上的半监督降维算法[J].模式识别与人工智能,2009,22(4):574-580. 被引量：5
10赵戈,蔺蘭,唐延东,王耀南.张量分析在视差图生成中的应用[J].中国图象图形学报,2013,18(10):1307-1314. 被引量：1

控制与决策

2016年第9期

浏览历史

内容加载中请稍等...

基于张量的XML相似度计算方法被引量：2

参考文献13

二级参考文献10

共引文献6

同被引文献11

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于张量的XML相似度计算方法 被引量：2

参考文献13

二级参考文献10

共引文献6

同被引文献11

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于张量的XML相似度计算方法被引量：2