期刊文献+

基于卡方特征和BTM融合的短文本分类方法 被引量:1

Short Text Classification Based on Chi-square Feature and BTM
下载PDF
导出
摘要 针对短文本特征稀疏、上下文依赖而导致的传统文本分类法应用效果不佳的问题,提出一种基于卡方特征和BTM的短文本分类法.首先提取短文本的卡方特征,再利用BTM对短文本建模,获得对应的文档-话题概率特征,最后融合两种特征并基于SVM分类算法实现短文本分类.实验结果表明,相比于常规分类方法,该方法具有较高的Macro-F1值,对短文本的分类具有良好的效果. Aiming at the shortage of traditional text classification method on account of text feature sparse and context dependency,a short text classification method based on Chi-square feature and BTM is proposed.Firstly,Chi-square features of short text are extracted,then it is modeled by BTM to get the corresponding document-topic probability features.Finally,the short text classification is obtained by combining these two features and SVM classification algorithm.Experimental results show that this method has high Macro-F1 value compared to the conventional classification method and verify that the method achieves a better performance in short text classification.
作者 李振兴 王松
出处 《兰州交通大学学报》 CAS 2016年第1期36-41,共6页 Journal of Lanzhou Jiaotong University
基金 中国铁路总公司科技研究开发计划课题(2014X008-F)
关键词 短文本分类 卡方特征 话题模型 BTM short text classification Chi-square feature topic model BTM
  • 相关文献

参考文献15

  • 1李岩,韩斌,赵剑.基于短文本及情感分析的微博舆情分析[J].计算机应用与软件,2013,30(12):240-243. 被引量:22
  • 2钱强,庞林斌,高尚.一种基于词共现图的受限领域自动问答系统[J].计算机应用研究,2013,30(3):841-843. 被引量:16
  • 3Park E K,Ra D Y,Jang M G. Techniques for improving web retrieval effectiveness [J]. Information Processing & Management,2005,41(5):1207-1223.
  • 4马雯雯,魏文晗,邓一贵.基于隐含语义分析的微博话题发现方法[J].计算机工程与应用,2014,50(1):96-100. 被引量:36
  • 5Bollegala D, Matsuo Y, Ishizuka M. Measuring semantic similarity between words using web search engines [C]//Proeeedings of the 16th International Conference on World Wide Web. Banff.. IEEE Press, 2007,7 .. 757- 766.
  • 6Banerjee S, Ramanathan K, Gupta A. Clustering short texts using wikipedia [C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information retrieval. Amsterdam.. ACM, 2007 .. 787-788.
  • 7Hu X, Sun N, Zhang C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge [C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. Hong Kong: ACM, 2009 : 919-928.
  • 8Phan X H, Nguyen L M, Horiguchi S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections [C] //Proceedings of the 17th International Conference on World Wide Web. Beijing .. ACM, 2008 : 91-100.
  • 9Blei D M,Ng A Y,Jordan M I. Latent Dirichlet allocation [J]. The Journal of Machine Learning Research, 2003,3(5) : 993-1022.
  • 10Yan X,Guo J,Lan Y,et al. A biterm topic model for short texts [C]//Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro: ACM,2013 1445-1456.

二级参考文献99

共引文献166

同被引文献5

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部