期刊文献+

基于TFIDF与分类树的工程文本信息分类法 被引量:3

ENGINEERING TEXT INFORMATION CLASSIFICATION BASED ON TFIDF AND CLASSIFICATION TREE
下载PDF
导出
摘要 针对传统的分类算法不能满足多层次的工程信息分类,提出一种基于词频逆文档频率TFIDF(term frequency inverse document frequency)和分类树的多层工程信息分类法。通过对每条工程信息生成多层分类树,在不同层次构建TFIDF矩阵,减少冗余计算。通过计算树结点中储存的相似度,进行判决得出分类结果。与传统单层分类算法相比,基于树的判决方法可以对类进行多级划分、多类属划分,且计算时间仅为单层分类的59%,并获得了95.1%的召回率和97.4%的准确率,具有很好的灵活性与鲁棒性。实验结果证实了算法的有效性。 For traditional classification algorithms can' t satisfy the requirement of multilevel engineering information classification, we propose a multilayer engineering information classification method which is based on TFIDF ( term frequency inverse document frequency) and classification tree. The algorithm reduces redundant computation by creating multilevel classification tree on each engineering information piece and building TFIDF matrix at different levels. Through calculating the similarity stored in tree nodes, the algorithm gets the classification result by judging. Compared with traditional single-level classification algorithm, the tree-based judgement method can make multi-level classification and muhiple-generic division on classes. The computation time is only 59% of the single-level classification, and a recall rate of 95. 1% and accuracy of 97. 4% are obtained. It has good flexibility and robustness. Experimental results confirm the effectiveness of the algorithm.
出处 《计算机应用与软件》 CSCD 北大核心 2014年第6期174-176,191,共4页 Computer Applications and Software
关键词 信息分类 词频逆文档频率 分类树 Information classification TFIDF Classification tree
  • 相关文献

参考文献7

  • 1唐华松,姚耀文.数据挖掘中决策树算法的探讨[J].计算机应用研究,2001,18(8):18-19. 被引量:118
  • 2Richard O Duda, Peter E Hart, David G Stork. Pattern Classification[M].机械工业出版社,2003.
  • 3GB50359-2010建设工程分类标准[S].
  • 4Richard O Duda,SanJose,Peter EHart.模式分类[M].机械工业出版社,2003.
  • 5Sahon G,Clement T Y. On the construction of effective vocabularies for infommtion retrieval [ C]//Proceedings of the 1973 Meeting on Pro- gramming Languages and Information Retrieval. New York: ACM, 1973:11.
  • 6HanJiawei,MichelineKamber.数据挖掘[M].机械工业出版社,2007.
  • 7严蔚敏主编.数据结构[M].清华大学出版社,2007/03

二级参考文献1

  • 1王仲谋,数据仓库—客户/服务器计算指南,1997年

共引文献122

同被引文献31

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部