期刊文献+

基于特征偏好的XML文档聚类算法

Clustering XML documents based on feature order preference
下载PDF
导出
摘要 XML文档聚类在众多数据应用领域都具有重要作用。基于特征偏好的XML文档聚类算法是对XML文档进行特征选择,将XML文档描述为n维特征向量,再结合CFP(Clustering with Feature order Preference)算法,根据特征偏好为其赋予权重,每次迭代聚类过程中进行权重的更新。实验结果表明当CFP算法中的特征偏好权重和XML文档向量化时所用的层次权重设定相结合时,可弥补XML文档向量化时的弊端,提高了XML文档聚类的精度。 Clustering for XML documents plays important roles in many data application domains. The algorithm of clus-tering for XML documents with feature order preference selects features from XML documents, represents XML documents as vectors in an abstract n-dimensional feature space, sets weights for each feature according to the feature order preference, and updates weights in each iterative clustering process. Experimental results show that when the feature order preference in CFP(Clustering with Feature order Preference)combines with the level weight used in the XML document representation, this application can offset the shortcomings when vectorizing XML documents and improve the precision of clustering for XML documents.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第12期64-68,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.61202350)
关键词 可扩展标记语言(XML)文档聚类 层次权重 特征偏好 clustering Extensible Markup Language(XML)documents level weight feature order preference
  • 相关文献

参考文献22

  • 1Algergawy A,Mesiti M,Nayak R,et al.XML data clustering:an overview[J].ACM Computing Surveys,2011,43(4).
  • 2Zhang K,Shasha D.Simple fast algorithms for the editingdistance between trees and related problems[J].SIAMJournal on Computing,1989,18(6):1245-1262.
  • 3Costa G,Manco G,Ortale R,et al.A tree-based approachto clustering XML documents by structure[C]//KnowledgeDiscovery in Databases:PKDD 2004.Berlin Heidelberg:Springer,2004:137-148.
  • 4Tai K C.The tree-to-tree correction problem[J].Journalof the ACM,1979,26(3):422-433.
  • 5郑仕辉,周傲英,张龙.XML文档的相似测度和结构索引研究[J].计算机学报,2003,26(9):1116-1122. 被引量:28
  • 6Tran T,Nayak R,Bruza P.Combining structure and contentsimilarities for XML document clustering[C]//Proceedingsof the 7th Australasian Data Mining Conference-Volume87,2008:219-225.
  • 7王桐,刘大昕.一种新的混合XML文档聚类方法[J].哈尔滨工程大学学报,2007,28(6):697-701. 被引量:7
  • 8Sun J,Zhao W,Xue J,et al.Clustering with feature orderpreferences[J].Intelligent Data Analysis,2010,14(4):479-495.
  • 9Piao Y,Wang X.A hybrid method for xml clustering bystructure and content[J].Journal of Software,2011,6(12).
  • 10刘波,杨路明,邓云龙.自适应的混沌粒子群算法优化XML文档聚类策略[J].系统仿真学报,2009,21(3):716-720. 被引量:3

二级参考文献67

共引文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部