期刊文献+

深层次分类中候选类别搜索算法 被引量:1

Candidate category search algorithm in deep level classification
下载PDF
导出
摘要 针对深层次分类中分类准确率低、处理速度慢等问题,提出一种待分类文本的候选类别搜索算法。首先,引入搜索、分类两阶段的处理思想,结合类别层次树的结构特点和类别间的相关联系等隐含的领域知识,进行了类别层次权重分析和特征项的动态更新,为类树层次结构的各个节点构建更具分类判断力的特征项集合;进而,采用深度优先搜索算法并结合设定阈值的剪枝策略缩小搜索范围,搜索得到待分类文本的最优候选类别;最后,在候选类别的基础上应用经典的K最近邻(KNN)分类算法和支持向量机(SVM)分类算法进行分类测试和对比分析。实验结果显示,所提算法的总体分类性能优于传统的分类算法,而且使平均F1值较基于贪心策略的启发式搜索算法提高了6%左右。该算法显著提高了深层次文本分类的分类准确度。 Aiming at the problem of low classification accuracy and slow processing speed in deep classification, a candidate category searching algorithm for text classification was proposed. Firstly, the search, classification of two-stage processing ideas were introduced, and the weighting of the category hierarchy was analyzed and feature was updated dynamically by combining with the structure characteristics of the category hierarchy tree and the related link between categories as well as other implicit domain knowledge. Meanwhile feature set with more classification judgment was built for each node of the category hierarchy tree. In addition, depth first search algorithm was used to reduce the search range and the pruning strategy with setting threshold was applied to search the best candidate category for classified text. Finally, the classical K Nearest Neighbor (KNN) classification algorithm and Support Vector Machine (SVM) classification algorithm were applied to classification test and contrast analysis on the basis of candidate classes. The experimental results show that the overall classification performance of the proposed algorithm is superior to the traditional classification algorithm, and the average F1 value is about 6% higher than the heuristic search algorithm based on greedy strategy. The algorithm improves the classification accuracy of deep text classification significantly.
出处 《计算机应用》 CSCD 北大核心 2017年第3期635-639,672,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61662043)~~
关键词 深层文本分类 类别层次 类别层次树 深度优先搜索 候选类别 deep text classification class hierarchy tree-structured class hierarchy depth first search candidate category
  • 相关文献

参考文献5

二级参考文献92

  • 1李红莲,王春花,袁保宗,朱占辉.针对大规模训练集的支持向量机的学习策略[J].计算机学报,2004,27(5):715-719. 被引量:53
  • 2袁时金,李荣陆,周水庚,胡运发.层次化中文文档分类[J].通信学报,2004,25(11):55-63. 被引量:6
  • 3凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 4谭金波.一种改进的文档层次分类方法[J].现代图书情报技术,2007(2):56-59. 被引量:3
  • 5刘俊.基于大数据流的Multi-Agent系统模型研究[J].计算机技术与发展,2007,17(5):166-169. 被引量:10
  • 6Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2010, 22(1-2): 31-72.
  • 7Guan Hu, Zhou Jing-Yu, Guo Min-Yi. A class-feature-cen- troid classifier for text categorization//Proceedings of the 18th international conference on World Wide Web. Madrid, Spain, 2009:201-210.
  • 8Wang Xiao-Lin, Zhao Hai, Lu Bao-Liang. Enhance K Nea- rest neighbor algorithm for large-scale multi-labeled hierar- chical classification//Proceedings of the 2011 European Con- ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece, 2011: 58-66.
  • 9Zhang Cong-Le, Xue Gui-Rong, YongZu et al. Web-scale classification with Naive Bayes//Proceedings of the 18th In- ternational Conference on World Wide Web. Madrid, Spain, 2009 : 1083-1084.
  • 10Labrou Y, Finin T W. Yahoo! as an ontology: Using Yahoo! Categories to describe documents//Proceedings of the 8th International Conference on Information and Knowl- edge Management. Kansas City, USA, 1999: 180-187.

共引文献299

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部