期刊文献+

综合结构和内容的XML文档相似度计算方法 被引量:4

Combining Structure and Content Similaritiesmeasure for XML Document
下载PDF
导出
摘要 提出了一种综合考虑XML文档内容和结构信息的文档相似度计算方法.通过使用不同的方法分别计算文档内容信息相似度和结构信息相似度,然后赋予二者不同的权重将二者综合起来,得到文档的综合相似度.在真实数据集上的实验结果表明,综合结构和内容信息的方法能够提高计算XML文档相似度的准确性. This paper proposed a document similarity calculation method considering the XML document content and structure information in this paper. Different methods was used to calculate the document content similarity and structural information, and different emphasis was laied on them. Then the comprehensive similarity of the document can he attained. Experimental results on real data sets show that the method integrated structure and content information can improve the accuracy of calculation of XML documents similarity.
出处 《微电子学与计算机》 CSCD 北大核心 2016年第4期69-72,76,共5页 Microelectronics & Computer
基金 国家自然科学基金(61170306)
关键词 内容相似度 结构相似度 XML相似度 向量空间模型 路径频率 content similarity structure similarity XML similarity VSM path frequency
  • 相关文献

参考文献8

  • 1Algergawy A, Mesiti M, Nayak R, et al. XML data clustering: An overview[J]. ACM Computing Surveys (CSUR), 2011, 43(4): 25.
  • 2Brzezinski D, Piernik M. Adaptive XML stream clas- sification using partial tree-edit distance[M]ffFounda- tions of Intelligent Systems. Berlin:Springer Interna- tional Publishing, 2014: 10-19.
  • 3Hossain M S, Angryk R A. Gdclust: A graph-based document clustering technique [ C]// Data Mining Workshops, 2007. (hnaha: IEEE, 2007: 417-422.
  • 4Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
  • 5Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information processing management, 1988, 24(5): 513-523.
  • 6Yang J, Cheung W K, Chen X. I.earning element sim- ilarity matrix for semi-structured document analysis [J]. Knowledge and Information Systems, 2009, 19 (1) : 53-78.
  • 7Yoon I P, Raghavan V, Chakilam V. BitCube: A three-dimensional bitmap indexing for XML documents [C]//Scientific and Statistical Database Management, 2001. Fairfax: IEEE, 2001: 158-167.
  • 8Kurt A, Tozal E. Classification of xslt-generated web documents with support vector machines [M] // Knowledge Discovery from XML Documents. Berlin , Heidelberg:Springer, 2006: 33-42.

同被引文献51

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部