期刊文献+

用带权重的pq-gram算法计算XML文档相似度 被引量:1

Calculating Similarity of XML Documents by Weighted Pq-gram Algorithm
下载PDF
导出
摘要 XML文档聚类是高效管理XML文档的重要手段,XML文档相似度计算正是其中的关键步骤。pq-gram算法是解决XML文档相似度计算问题的有效手段,但忽略了XML文档结点的有序性。带权重的pq-gram算法是在此基础上,依据XML文档的结构性,首先为结点赋予相应权重,然后基于结点的权重对pq-gram赋予权重,最后将设定的权重应用到XML文档相似度计算中。实验结果表明,带权重的pq-gram算法更好地描述结点在XML文档相似度计算中的贡献度,提高了XML文档相似度计算的精度。 Clustering for XML documents is an important method for efficiently managing XML documents,and calculating similarity of XML documents is the pivotal step. Pq-gram algorithm is an efficient method to solve the problem of calculating similarity of XML documents. However,it ignores that the nodes of XML documents are ordered. Based on the pq-gram algorithm,weighted pq-gram algorithm,in accordance with the structural characteristics of XML documents,sets weight for nodes,and sets weight for pq-grams based on the weight of nodes,then applies the weight to the method of calculating similarity of XML documents. Experimental results show that the weighted pq-gram algorithm describes the contribution of nodes better in the process of calculating similarity of XML documents,and improves the precision of calculating of XML documents.
出处 《计算机与现代化》 2015年第3期20-25,共6页 Computer and Modernization
基金 国家自然科学基金资助项目(61202350)
关键词 XML文档 计算相似度 pq-gram 权重 XML documents calculate similarity pq-gram weight
  • 相关文献

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部