期刊文献+

基于内容和分层结构的XML文件自动分类方法 被引量:4

Method of classification based on content and hierarchical structure for XML file
下载PDF
导出
摘要 提出了一种以XML文件内在的分层结构为基础的文件分类方法,并与改良的VSM方法的实验结果进行了比较。和以往XML文件的分类方法不同的是,此方法更加注重XML文件特有的结构信息。首先利用TF-IDF方法针对XML文件非结构的信息产生一般特征集,然后再针对XML文件各个层次重要性赋予一定的权重,从而产生层次特征集,然后根据一些领域知识,产生知识特征集,将三个特征集结合起来对XML进行分类。试验结果表明,这种方法比改良的VSM方法在分类的准确性方面有大幅的提高。 A new method of classification based on hierarchical structure for XML file is proposed in this paper.Three feature word clusters are separately generated from the content,hierarchical structure,and domain knowledge.They all lead to the classification result.An experiment system is designed to show this method effective and feasible.
作者 唐凯
出处 《计算机工程与应用》 CSCD 北大核心 2007年第3期168-172,193,共6页 Computer Engineering and Applications
关键词 特征词 文件自动分类 分层结构 feature word text auto classification hierarchical structure
  • 相关文献

参考文献15

  • 1Yi J,Sundaresan N.A classifier for semi-structured documents[C]//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:340-344.
  • 2W3C.Extensible Markup Language(XML) 1.0(Third Edition) URL[EB/OL].http://www.w3.org/TR/2004/REC -xml-20040204/.
  • 3Aiello M,Monz C,Todoran L.Document understanding for a broad class of documents[J].International Journal on Document Analysis and Recognition,2002,5(1):1-16.
  • 4Allan J,Croft B.Challenges in information retrieval and language modeling[J].ACM SIGIR Forum,2003,37(1).
  • 5Belkin N J,Croft W B.Information filtering and information retrieval:two sides of the same coin?[J].Communications of ACM,1992,35(12):29-38.
  • 6Chisholm E,Kolda T G.ORNL/TM-13756 New term weighting formulas for the vector space method in information retrieval[R].Computer Science and Mathematics Division,Oak Ridge National Laboratory,1998.
  • 7Jenkins C,Inman D.Adaptive automatic classification on the Web[C]//11th International Workshop on Database and Export Systems Application,2000:504-511.
  • 8Salton G,Buckley C.Automatic text structuring and retrieval-experiments in automatic encryclopedia searching[C]//Proceedings of the Fourteenth International ACM SIGIR Conference on Research and Development in Information Retrieval,1991:21-30.
  • 9Salton G,Wong A,Yang C S.A vector space modal for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
  • 10王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275

二级参考文献7

共引文献401

同被引文献62

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部