期刊文献+

分层特征计算和错误控制的层次分类方法 被引量:2

Hierarchical Classification Approach of Hierarchical Feature Selection and Error Control
下载PDF
导出
摘要 中文新闻信息分类标准中,类别数量大。在将其应用于新闻分类时,会出现训练模型大、训练时间长,尤其是当部分类别改变时需要全部重新训练等问题。由于分类标准中类别之间存在层次关系,因此层次分类方法可以作为解决方案。研究层次化的中文新闻分类方法,并从以下两方面改善层次化分类方法的效果:1)分层的新闻特征计算,解决了层次分类中新闻在分类类别下的特征向量的不同表示的问题;2)错误控制,解决了在上一层分类错误的情况下新闻不会分到正确的类别上的情况。实验结果表明,层次分类方法的效果比平面分类的准确度提高了约4%,进行多次特征权重计算的层次分类方法比普通的层次分类的准确度提高了约3%,同时进行错误控制的分类效果比普通层次的分类效果提高了大概3%。 There are thousands of subjects in Chinese news subject specification.When they are used in news classification,long training time and large model are two key problems we are facing,especially when some of classes are changed.Chinese news subject classification has hierarchical structure and hierarchical can solve the problem partially.We improved the Chinese news hierarchical classification to get better the result from two points of view.1)Repetitious feature calculation represents news of different layers in hierarchical classification.2)Use error control to solve the problem that one error classification in upper layer will lead in the error classification of its deeper classes.Our experiments shows that hierarchical classification improves the precision of 4% comparing with flat classification,hierarchical classification with Repetitious feature calculation improves 3% comparing with hierarchical classification,and hierarchical classification with error control improves 3% comparing with hierarchical classification.
出处 《计算机科学》 CSCD 北大核心 2010年第10期165-168,180,共5页 Computer Science
基金 国家973项目(No.2007CB310803)资助
关键词 层次分类 支持向量机 中文信息分类标准 特征计算 错误控制 Hierarchical classification Support vector machine Chinese news subject classification specification Feature calculation Error control
  • 相关文献

参考文献13

  • 1Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer, 2000 : 1-300.
  • 2Svmlight J T. An implementation of Support Vector Machines (SVMs) in C[EB/OL]. http://svmlight. joachims. org/.
  • 3Sun Aixin, Lira Ee-Peng. Hierarchical text classification and evaluation[C]//Proceedings of the 2001 International Conference on Data Mining. 2000 : 521-528.
  • 4Ruiz M E, Srinivasan P. Hierarchical neural networks for text eategorization[C]//Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR' 99). 1999:281-282.
  • 5Dumais S, Chen H. Hierarchical classification of Web content [C]//Proceedings of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval. 2000:256-263.
  • 6Dekel O, Keshet J, Singer Y. Large margin hierarchical classification[C]//Proceedings of the 21st ICML. 2004:27-34.
  • 7Cai Lijuan. Hierarchical Document Categorization with Support Vector Machines[C]// CIKM04. 2004.. 78-86.
  • 8Cesa-Bianchi N. Hierarchical Classification : Combining Bayes with SVM[C]//Proeeedings of the 23rd ICML. 2006:177-184.
  • 9Cheng C, Tang J, Fu A Wai-chee, et al. Hierarchical Classification of Documents with Error Control[C]//PAKDD. 2001: 433-443.
  • 10Susan G. Training a Hierarchical Classifier Using Interdocument Relationships[J]. Journal of the American Society for Information Science and Technology, 2009,60 (1) : 47-58.

同被引文献22

  • 1Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2011, 22: 31-72.
  • 2Koller D, Sahami M. Hierarchically classifying documents using very few words//Proceedings of the 14th International Conference on Machine Learning (ICML-1997). San Francisco: Morgan Kaufmann, 1997:170-178.
  • 3Babbar R, Partalas I, Gaussier E, et al. On flat versus hierarchical classification in large-scale taxonomies// Burges C J C, Bottou L, Welling M, et al. Advances in Neural Information Processing Systems (NIPS-2013). Lake Tahoe: NIPS Foundation, 2013:1824-1832.
  • 4Tseng H, Chang P, Andrew G, et al. A conditional random field word segmenter // Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Jeju Island, 2005:168-171.
  • 5Chang P C, Galley M, Manning C D. Optimizing Chinese word segmentation for machine translation performance//Proceedings of the Third Workshop on Statistical Machine Translation. Columbus: Associa- tion for Computational Linguistics, 2008:224-232.
  • 6McCallum A, Nigam K. A comparison of event models for naive Bayes text classification // Procee- dings of the AAAI-1998 Workshop on Learning for Text Categorization. Madison, 1998:41-48.
  • 7Li Baoli, Lu Qin, Yu Shiwen. An adaptive k-nearest neighbor text categorization strategy. ACM Transac- tions Asian Language Information Processing, 2004, 3(4): 215-226.
  • 8Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article 27.
  • 9Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 2008, 9:1871-1874.
  • 10JOACHAIMS T. Text categorization with support vector machines : Learning with many relevant features [ C ]// Proceedings of Machine Learning: ECML-98. 10" Euro- pean Conference on Machine Learning. Berlin, Germany: Springer, 1998 : 137-142.

引证文献2

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部