期刊文献+

基于iTopicModel的关联文本分类算法

Relational Text Classification Algorithm Based on iTopicModel
下载PDF
导出
摘要 针对传统文本分类方法对文档间关联关系考虑不充分的问题,提出一种基于iTopicModel的关联文本分类算法。根据类信息已知的文档归属于各个主题的概率判断主题代表的类信息,利用待分类文档归属于各个主题的概率及文本信息对文档进行分类。实验结果表明,当文档间的关联关系对类信息影响较大时,TC-iTM的分类性能优于传统文本分类方法。 In order to solve the problem that traditional text classification methods do not emphasize the links among text documents enough,this paper proposes a novel text classification algorithm TC-iTM based on iTopicModel.TC-iTM uses the probability that the labeled documents are assigned to each topic to judge the category that each topic represents.TC-iTM classifies unlabelled documents by using the probability that the documents are assigned to each topic and the text information of these documents.Experimental result shows that TC-iTM outperforms the traditional text classification methods when links among documents are important to the categories of the documents in document network.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第21期124-125,130,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60970083)
关键词 文本分类 文档网络 主题模型 EM算法 text classification document network topic model EM algorithm
  • 相关文献

参考文献7

  • 1Thoresten J. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//Proc. of ECML'98. New York, USA: Springer, 1998: 137-142.
  • 2Andrew M, Kamal N. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proc. of AAAI-98 Workshop on Learning for Text Categorization. Menlo Park, California, USA:AAAI Press, 1998: 41-48.
  • 3Xue Guirong, Dai Wenyuan, Yang Qiang, et al. Topic-bridged PLSA for Cross-domain Text Classification[C]//Proc. of SIGIR'08. New York, USA: ACM Press, 2008: 627- 634.
  • 4柴玉梅,朱国重,咎红英,胡达明,冼家扬.基于质心的文本分类算法[J].计算机工程,2009,35(20):83-85. 被引量:6
  • 5Sun Yizhou, Han Jiawei, Gao Jing, et al. iTopicModel: Information Networkintegrated Topic Modeling[C]//Proc. of ICDM'09. Washington D. C., USA: IEEE Press, 2009: 493-502.
  • 6Chang C C, Lin C J. LIBSVM: A Library for Support Vector Machines[EB/OL]. (2010-10-30). http://www.csie.ntu.edu.tw/- cjlin/libsvm/.
  • 7Andrew M. MALLET: A Machine Learning for Language Tool- kit[EB/OL]. (2010-10-30). http://mallet.cs.umass.edu.

二级参考文献4

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:383
  • 2Yang Yiming, Liu Xin. Are-examination of text Categorization Methods[C]//Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval. New York, USA: ACM Press, 1999.
  • 3Han Eui-Hong, Karypis G. Centroid-based Document Classification Algorithms: Analysis & Experimental Results[R]. Minneapolis, USA: Department of Computer Science, University of Minnesota, Technical Report: TR-00-017, 2000.
  • 4Lertnattee V, Theeramunkong T. Effect of Term Distributions on Centroid-based Text Categorization[J]. Information Sciences, 2004, 158(1): 89-115.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部