摘要
XML文档聚类是高效管理XML文档的重要手段,XML文档相似度计算正是其中的关键步骤。pq-gram算法是解决XML文档相似度计算问题的有效手段,但忽略了XML文档结点的有序性。带权重的pq-gram算法是在此基础上,依据XML文档的结构性,首先为结点赋予相应权重,然后基于结点的权重对pq-gram赋予权重,最后将设定的权重应用到XML文档相似度计算中。实验结果表明,带权重的pq-gram算法更好地描述结点在XML文档相似度计算中的贡献度,提高了XML文档相似度计算的精度。
Clustering for XML documents is an important method for efficiently managing XML documents,and calculating similarity of XML documents is the pivotal step. Pq-gram algorithm is an efficient method to solve the problem of calculating similarity of XML documents. However,it ignores that the nodes of XML documents are ordered. Based on the pq-gram algorithm,weighted pq-gram algorithm,in accordance with the structural characteristics of XML documents,sets weight for nodes,and sets weight for pq-grams based on the weight of nodes,then applies the weight to the method of calculating similarity of XML documents. Experimental results show that the weighted pq-gram algorithm describes the contribution of nodes better in the process of calculating similarity of XML documents,and improves the precision of calculating of XML documents.
出处
《计算机与现代化》
2015年第3期20-25,共6页
Computer and Modernization
基金
国家自然科学基金资助项目(61202350)