分层特征计算和错误控制的层次分类方法被引量：2

Hierarchical Classification Approach of Hierarchical Feature Selection and Error Control

下载PDF

导出

摘要中文新闻信息分类标准中,类别数量大。在将其应用于新闻分类时,会出现训练模型大、训练时间长,尤其是当部分类别改变时需要全部重新训练等问题。由于分类标准中类别之间存在层次关系,因此层次分类方法可以作为解决方案。研究层次化的中文新闻分类方法,并从以下两方面改善层次化分类方法的效果:1)分层的新闻特征计算,解决了层次分类中新闻在分类类别下的特征向量的不同表示的问题;2)错误控制,解决了在上一层分类错误的情况下新闻不会分到正确的类别上的情况。实验结果表明,层次分类方法的效果比平面分类的准确度提高了约4%,进行多次特征权重计算的层次分类方法比普通的层次分类的准确度提高了约3%,同时进行错误控制的分类效果比普通层次的分类效果提高了大概3%。 There are thousands of subjects in Chinese news subject specification.When they are used in news classification,long training time and large model are two key problems we are facing,especially when some of classes are changed.Chinese news subject classification has hierarchical structure and hierarchical can solve the problem partially.We improved the Chinese news hierarchical classification to get better the result from two points of view.1）Repetitious feature calculation represents news of different layers in hierarchical classification.2）Use error control to solve the problem that one error classification in upper layer will lead in the error classification of its deeper classes.Our experiments shows that hierarchical classification improves the precision of 4% comparing with flat classification,hierarchical classification with Repetitious feature calculation improves 3% comparing with hierarchical classification,and hierarchical classification with error control improves 3% comparing with hierarchical classification.

作者吴碧军李涓子金鑫

机构地区清华大学计算机系

出处《计算机科学》 CSCD 北大核心 2010年第10期165-168,180,共5页 Computer Science

基金国家973项目(No.2007CB310803)资助

关键词层次分类支持向量机中文信息分类标准特征计算错误控制 Hierarchical classification Support vector machine Chinese news subject classification specification Feature calculation Error control

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer, 2000 : 1-300.
2Svmlight J T. An implementation of Support Vector Machines (SVMs) in C[EB/OL]. http://svmlight. joachims. org/.
3Sun Aixin, Lira Ee-Peng. Hierarchical text classification and evaluation[C]//Proceedings of the 2001 International Conference on Data Mining. 2000 : 521-528.
4Ruiz M E, Srinivasan P. Hierarchical neural networks for text eategorization[C]//Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR' 99). 1999:281-282.
5Dumais S, Chen H. Hierarchical classification of Web content [C]//Proceedings of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval. 2000:256-263.
6Dekel O, Keshet J, Singer Y. Large margin hierarchical classification[C]//Proceedings of the 21st ICML. 2004:27-34.
7Cai Lijuan. Hierarchical Document Categorization with Support Vector Machines[C]// CIKM04. 2004.. 78-86.
8Cesa-Bianchi N. Hierarchical Classification : Combining Bayes with SVM[C]//Proeeedings of the 23rd ICML. 2006:177-184.
9Cheng C, Tang J, Fu A Wai-chee, et al. Hierarchical Classification of Documents with Error Control[C]//PAKDD. 2001: 433-443.
10Susan G. Training a Hierarchical Classifier Using Interdocument Relationships[J]. Journal of the American Society for Information Science and Technology, 2009,60 (1) : 47-58.

同被引文献22

1Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2011, 22: 31-72.
2Koller D, Sahami M. Hierarchically classifying documents using very few words//Proceedings of the 14th International Conference on Machine Learning (ICML-1997). San Francisco: Morgan Kaufmann, 1997:170-178.
3Babbar R, Partalas I, Gaussier E, et al. On flat versus hierarchical classification in large-scale taxonomies// Burges C J C, Bottou L, Welling M, et al. Advances in Neural Information Processing Systems (NIPS-2013). Lake Tahoe: NIPS Foundation, 2013:1824-1832.
4Tseng H, Chang P, Andrew G, et al. A conditional random field word segmenter // Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Jeju Island, 2005:168-171.
5Chang P C, Galley M, Manning C D. Optimizing Chinese word segmentation for machine translation performance//Proceedings of the Third Workshop on Statistical Machine Translation. Columbus: Associa- tion for Computational Linguistics, 2008:224-232.
6McCallum A, Nigam K. A comparison of event models for naive Bayes text classification // Procee- dings of the AAAI-1998 Workshop on Learning for Text Categorization. Madison, 1998:41-48.
7Li Baoli, Lu Qin, Yu Shiwen. An adaptive k-nearest neighbor text categorization strategy. ACM Transac- tions Asian Language Information Processing, 2004, 3(4): 215-226.
8Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article 27.
9Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 2008, 9:1871-1874.
10JOACHAIMS T. Text categorization with support vector machines : Learning with many relevant features [ C ]// Proceedings of Machine Learning: ECML-98. 10" Euro- pean Conference on Machine Learning. Berlin, Germany: Springer, 1998 : 137-142.

引证文献2

1卢玲,杨武,刘恒洋.差错网络的文本分类反馈校正方法[J].重庆邮电大学学报（自然科学版）,2014,26(6):790-795. 被引量：1
2李保利.基于类别层次结构的多层文本分类样本扩展策略[J].北京大学学报（自然科学版）,2015,51(2):357-366. 被引量：17

二级引证文献18

1陆晓蕾,倪斌.基于预训练语言模型的BERT-CNN多层级专利分类研究[J].中文信息学报,2021,35(11):70-79. 被引量：19
2张伟,简刚.基于不均衡文本数据的集成分类方法设计[J].电信技术研究,2018,0(4):55-64.
3朱全银,潘禄,刘文儒,李翔,张永军,刘金岭.Web科技新闻分类抽取算法[J].淮阴工学院学报,2015,24(5):18-24. 被引量：3
4李全.基于多层类别主题图模型的教育文本分类方法[J].计算机与现代化,2016(7):55-59.
5张忠林,刘述昌,江粉桃.深层次分类中候选类别搜索算法[J].计算机应用,2017,37(3):635-639. 被引量：1
6耿增民,杜剑侠,陈迪,周毅灵,邵熙雯.基于结构和链接分析的网页文档分类算法研究[J].中北大学学报（自然科学版）,2017,38(3):354-359. 被引量：1
7张璜.公共云存储中私密数据的去重删除研究[J].现代电子技术,2017,40(23):73-76. 被引量：3
8覃兵文.大数据的分类挖掘优化技术[J].现代电子技术,2017,40(24):34-36. 被引量：7
9廖志锦.运动数据中的最优关联数据的分类分析[J].现代电子技术,2017,40(24):53-55. 被引量：2
10邓广彪,黄振功,岳晓光.基于Nesterov平滑的高阶路径朴素贝叶斯文本隐式分类研究[J].西南师范大学学报（自然科学版）,2018,43(7):107-112. 被引量：2

1互联网上常见的错误信息[J].中国科技信息,1999(5):76-77.
2黄炳桂.流媒体传输Qos研究[J].电脑与电信,2006(10):15-17.
3王序臻.Web文本层次分类方法研究[J].温州职业技术学院学报,2008,8(3):44-47.
4蔡正保.一种基于扩频和错误控制编码的图像隐写通信系统研究[J].鸡西大学学报（综合版）,2016,16(5):43-45. 被引量：1
5唐贵平,何兴,邓敏,邓小清.流媒体应用的服务质量问题分析[J].中国青年科技,2008(3):19-24.
6李强,戴青,贾利新,刘松,李勇.基于终端的QoS控制技术研究[J].河南科学,2005,23(3):436-439.
7刘骋.基于Internet的实时视频流的应用层QoS控制策略[J].山东通信技术,2004,24(4):4-8.
8李方敏,姚砺,叶澄青.组播错误控制技术[J].计算机科学,2000,27(1):69-72. 被引量：1
9江南,虞红英,张根度.一个实现通信协议的快速原型工具[J].计算机工程,1997,23(5):52-54.
10徐保鑫,怀丽波,崔荣一.基于MapReduce的朴素贝叶斯算法在新闻分类中的应用[J].延边大学学报（自然科学版）,2017,43(1):55-59. 被引量：4

计算机科学

2010年第10期

浏览历史

内容加载中请稍等...

分层特征计算和错误控制的层次分类方法被引量：2

参考文献13

同被引文献22

引证文献2

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

分层特征计算和错误控制的层次分类方法 被引量：2

参考文献13

同被引文献22

引证文献2

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

分层特征计算和错误控制的层次分类方法被引量：2