期刊文献+

基于模糊路径匹配的XML文档分类研究

RESEARCH ON XML CLASSIFICATION BASED ON FUZZY PATH MATCHING
下载PDF
导出
摘要 XML是互联网上信息表示和数据交互的重要标准,文档分类是解决从海量信息中获取有效信息的重要方法,提出一种基于模糊路径匹配的XML文档分类方法。首先去除对分类没有影响的信息;然后采用一种混合的XML文档相似性计算方法,将XML文档表示为路径的集合。为了提高效率,删除了文档中重复出现的路径后进行模糊匹配,用匈牙利算法计算出文档间的相似度;最后使用改进的K近邻算法进行文档的分类。使用自动生成及真实的文档集进行实验,结果表明:两组文档分类的正确率均可以达到100%。 XML is an important standard of information representation and data exchange over Internet,document classification is an important way to get useful information from mass of information solutions,in this paper we propose a method of XML document classification which is based on fuzzy matching path.First,it removes the information that has no influence on the classification;Then it uses a mixed computation method of XML document similarity,expresses the XML document as a collection of paths;In order to improve the efficiency, the method deletes the recurring paths in the document and carries out fuzzy matching,and employs Hungarian algorithm to calculate the similarity between documents;Finally it uses the improved k-nearest neighbour algorithm to classify documents.The automatically generated documentation sets and real data sets are used in the experiment,and results show the accuracy of document classification in both sets could all reach 100%.
出处 《计算机应用与软件》 CSCD 2015年第10期113-115,126,共4页 Computer Applications and Software
基金 云南省教育厅基金项目(2011Y010)
关键词 XML 分类 相似性 路径 语义 XML Classification Similarity Path Semantics
  • 相关文献

参考文献16

  • 1Garboni C, Masseglia F,Tronsse B. Sequential Pattern Mining for Struc- ture-Based XML Document Classification [ C ]//The 4th International Workshop of the Initiative for the Evaluation of XML Retrieval,Spring- er,2006:458 -468.
  • 2Knijf J D. FAT-CAT: Frequent Attributes Tree Based Classification [ C ]//The 5th International Workshop of the Initiative for the Evalua- tion of XML Retrieval, Springer, 2007:485 - 496.
  • 3Zaki M, Aggarwal C. XRules : An effective algorithm for structural clas- sification of XML data [ J ]. Machine Learning, 2006,62 ( 1 ) : 137 - 170.
  • 4Zaki M J, Aggarwal C C. XRules:an effective structural classifier for XML data[ C]//The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM,2003 : 316 - 325.
  • 5Bouchachia A, Hassler M. Classification of XML Documents [ C ]// IEEE Symposium on Computational Intelligence and Data Mining, 2007:390 - 396.
  • 6Yi J, Sundaresan N. A classifier for semi-strnctured documents [ C ]// The 6th ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining,ACM ,2000:340 - 344.
  • 7Ghosh S, Mitra P. Combining Content and Structure Similarity for XML Document Classification using Composite SVM Kernels [ C ]//The 19 th International Conference on Pattern Recognition, Tampa, FL, 2008 : 1 -4.
  • 8Wu J, Tang J. A bottom-up approach for XML documents classification [C]//The 2008 International Symposium on Database engineering and applications, ACM,2008 : 131 - 137.
  • 9Andrew Nierman, H V. Jagadish Evaluating Structual Similarity in XML Document [ EB/OL ]. 2013 - 12. http://db, ucsd. edu/web- db2002/papem/44, pdf.
  • 10杜新林,刘丹,董妍.XML文档相似性的常用方法比较[J].长春大学学报,2009,19(6):30-31. 被引量:3

二级参考文献7

  • 1闫利国,贺飞.XM L文档结构相似测度研究[J].计算机应用研究,2006,23(3):44-46. 被引量:4
  • 2潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 3Elisa Bertino,Giovanna Guerrini,Marco Mesiti,Luigi Tosetto.Evolving a set of DTDs according to a dynamic set of XML documents[C]∥Proceedings of the 8th International Conference on Extending Database Technology (EDBT 2002):45-66.
  • 4Yuan Wang,David J,DeWitt,Jin-Yi Cai.X-Diff:an effective change detection algorithm for XML documents[C]∥Proceedings of the 19th International Conference on Data Engineering (ICDE 2003):519-530.
  • 5Sigmod XML data sets[OL].[2006-08].http://www.acm.org/sigmod/record/xml.
  • 6Shakespeare XML data sets[OL].[2006-08].http://metalab.unc.edu/bosak/xml/eg.
  • 7郑仕辉,周傲英,张龙.XML文档的相似测度和结构索引研究[J].计算机学报,2003,26(9):1116-1122. 被引量:28

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部