期刊文献+

基于标题的中文新闻网页自动分类 被引量:7

Automatic Classification Based on News Titles for Chinese News Web Pages
下载PDF
导出
摘要 借鉴tf-idf加权思想,利用新闻标题来做中文新闻网页自动分类的依据,构建基于标题的中文新闻自动分类方法,并设计多个实验对各种基于标题的中文新闻网页自动分类方法进行评测。实验结果表明,基于标题对中文新闻网页进行自动分类,可以大大缩短判断处理时间,节省存储空间,且准确率较高,特别是改进的类目加权法分类效果最好。 This paper describes automatic Chinese news Web pages classification by using news title based on tf-idf weighting scheme, and constructs correlation degree of news title which determines appropriate category for each news Web page. The performance of this proposed method is evaluated in terms of top one score, top two score, and top three score. The experimental evaluation demonstrates that improved tf - idf weighting scheme with categories provides high accuracy with the classification of Chinese news Web pages.
作者 钱爱兵 江岚
出处 《现代图书情报技术》 CSSCI 北大核心 2008年第10期59-68,共10页 New Technology of Library and Information Service
关键词 词频/逆文档频率 新闻标题 中文新闻网页 自动分类 tf- idf News title Chinese news Web pages Automatic classification
  • 相关文献

参考文献20

  • 1Fuchun P, Schuurmans D, Shaojun W. Augmenting Naive Bayes Classifiers with Statistical Language Models [ J ]. lnformation Retrieval, 2004(7) :317 - 345.
  • 2秦兵,郑实福,刘挺,张刚,李生.可分性判据在中文网页分类中的应用[J].微处理机,2002,23(1):26-28. 被引量:5
  • 3Joachiins T. Text Categoriztion with Support Vector Machine: Learning with Many Relevant Features [ C ]. In: Proceedings of the European Conference on Machine Learning ( ECML - 98 ) , Chemnitz. Germany, 1998 : 137 - 142.
  • 4Joachims T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms [ M]. Boston: Kluwer Academic Publishers, 2002 : 1 - 176,
  • 5Rung - Ching C, Chung - Hstm H. Web Page Classification Based on a Support Vector Machine Using a Weighted Vote Schema[ J ]. Expert Systems with Applications, 2006, 31 (2) : 427 -435.
  • 6Yiming Y, Liu X. A Re - Examination of Text Categorization Methods [ C ]. In : Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999:42-49.
  • 7Jyh - Jong T, Wang Jing - Doo. Improving Automatic Chinese Text Categorization by Error Correction [ C ]. In: Proceedings of the 5th International Workshop Information Retrieval with Asian Languages, 2000:1 - 8.
  • 8邓茜,林红.中文新闻信息自动分类标引的构想与实现[J].中国传媒科技,2005(9):19-21. 被引量:2
  • 9侯汉清,薛鹏军.基于知识库的网页自动标引和自动分类系统的设计[J].大学图书馆学报,2004,22(1):50-55. 被引量:37
  • 10何琳,侯汉清,白振田,张雪英.基于标引经验和机器学习相结合的多层自动分类[J].情报学报,2006,25(6):725-729. 被引量:19

二级参考文献38

  • 1马亮,陈群秀,蔡莲红.一种改进的自适应文本信息过滤模型[J].计算机研究与发展,2005,42(1):79-84. 被引量:18
  • 2张琪玉.关键词检索、概念检索和分类浏览检索一体化.巨灵研究报告[R].深圳巨灵信息技术研究所,2000-3..
  • 3薛鹏军.[D].南京农业大学,2001,6.
  • 4博科 哈罗德著.文摘的概念与方法[M].北京:书目文献出版社,1991,6..
  • 5张琪玉.自然语言检索中各种因素对检索效率的影响[A]..张琪玉情报语言学论文集[C].北京:北京图书馆出版社,1999,5..
  • 6Pao M L. Automatic text analysis based on transition phenomena of word occurrences. Journal of the American Society for Information Science. 1978(29).
  • 7G Salton.Development in automatic text retrieval[J].Science,1991,253(5023):974-980
  • 8S Wermter,G Arevian,C Panchev.Recurrent neural network learning for text routing[C].The Int'l Conf on Artificial Neural Networks,Edinburgh,UK,1999
  • 9F Sebastiani.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47
  • 10Y H Li,A K Jain.Classification of text documents[J].The Computer Journal,1998,41(8):537-546

共引文献81

同被引文献50

  • 1黄永文,何中市,伍星.用户评论的分类获取[J].计算机应用,2009,29(3):846-848. 被引量:5
  • 2马金娜,田大钢.基于SVM的中文文本自动分类研究[J].计算机与现代化,2006(8):5-8. 被引量:8
  • 3马金娜,田大钢.基于支持向量机的中文文本自动分类研究[J].系统工程与电子技术,2007,29(3):475-478. 被引量:14
  • 4王涛.基于HTML标记的主题爬行器的设计与实现[D].西安:电子科技大学,2009.
  • 5Na Li, Xuwei Pan, Ruiming Wang. Using Web Resources to Provide Enterprise Knowledge Service [ C ]. 2010 International Conference on Management Science and Engineering (MSE2010), Wuhan, China, 2010.10. 17-2010.10-18, Vol- ume 1I: 219-222.
  • 6闫超.基于SVM的中文文本自动分类系统的研究与实现[D].太原:太原理工大学,2010.
  • 7王强,关毅,王晓龙.基于标题类别语义识别的文本分类算法研究[J].电子与信息学报,2007,29(12):2885-2890. 被引量:6
  • 8Blei D,Ng A,Jordan M.Latent Dirichlet allocation[C].Journal of Machine.
  • 9Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communications of the ACM,1975(11):613-620.
  • 10Kim K,Chung B S,Choi Y R,et al.Semantic pattern tree kernels for short-text classification[C]//Proc of the 9th IEEE International Conference on Dependable:Autonomic and Secure Computing.[S.l.]:IEEE Press,2011:1250-1252.

引证文献7

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部