期刊文献+

基于词向量的文本分类研究 被引量:9

Research on Text Classification Based on Word Embedding
下载PDF
导出
摘要 针对传统特征选择算法分类准确率较低的问题,提出了基于词向量的文本特征选择改进算法。以微博数据为研究对象进行情感分类,提出类别区分能力强的特征项的相似项同时具有较强的类别区分能力,将Word2vec训练得到的词向量应用到传统的特征选择过程中,根据词向量之间存在的相似性关联对特征项进行适当扩充。实验结果表明,所提出的特征选择算法相比原特征选择算法其分类准确率有一定程度的改进。 Focusing on the problems of the low classification accuracy of traditional feature selection algorithm,an improvedtext feature selection algorithm is proposed based on word vector. The article takes microblog data as the research object to carry onthe sentiment analysis. It forwards an assumption that the feature items which are similar to the ones have strong category distinguishability,would also have strong ability to distinguish categories. It applies word embedding which Word2 vec trains to the process oftraditional feature selection,and expands the feature items appropriately according to the similarity relation between the word vec-tors.The experimental results show that the improved feature selection algorithm has better results in its classification accuracy.
作者 马力 李沙沙 MA Li;LI Shasha(Xi'an University of Posts & Telecommunications,Xi'an 710061)
机构地区 西安邮电大学
出处 《计算机与数字工程》 2019年第2期281-284,303,共5页 Computer & Digital Engineering
关键词 词向量 特征扩展 Word2vec 文本分类 word embedding feature expansion Word2vec text classification
  • 相关文献

参考文献5

二级参考文献47

共引文献202

同被引文献84

引证文献9

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部