摘要
针对短文本特征稀疏、上下文依赖而导致的传统文本分类法应用效果不佳的问题,提出一种基于卡方特征和BTM的短文本分类法.首先提取短文本的卡方特征,再利用BTM对短文本建模,获得对应的文档-话题概率特征,最后融合两种特征并基于SVM分类算法实现短文本分类.实验结果表明,相比于常规分类方法,该方法具有较高的Macro-F1值,对短文本的分类具有良好的效果.
Aiming at the shortage of traditional text classification method on account of text feature sparse and context dependency,a short text classification method based on Chi-square feature and BTM is proposed.Firstly,Chi-square features of short text are extracted,then it is modeled by BTM to get the corresponding document-topic probability features.Finally,the short text classification is obtained by combining these two features and SVM classification algorithm.Experimental results show that this method has high Macro-F1 value compared to the conventional classification method and verify that the method achieves a better performance in short text classification.
出处
《兰州交通大学学报》
CAS
2016年第1期36-41,共6页
Journal of Lanzhou Jiaotong University
基金
中国铁路总公司科技研究开发计划课题(2014X008-F)
关键词
短文本分类
卡方特征
话题模型
BTM
short text classification
Chi-square feature
topic model
BTM