期刊文献+

基于Word2vec的微博短文本分类研究 被引量:51

Research of Weibo Short Text Classification Based on Word2vec
下载PDF
导出
摘要 随着微博等社会化媒体的信息量急剧膨胀,人们迫切需要实现这些信息的自动分类处理,以帮助用户快速查找所需信息和过滤垃圾信息。针对传统文本分类模型存在的特征维数灾难、无语义特征等问题,文章基于Word2vec模型对微博短文本进行了分类研究。鉴于Word2vec模型无法区分文本中词汇的重要程度,进一步引入TFIDF对Word2vec词向量进行加权,实现加权的Word2vec分类模型。最后合并加权Word2vec和TFIDF两种模型,实验结果表明合并后模型分类准确率高于加权Word2vec模型和使用TFIDF的传统文本分类模型。 With the rapid expansion of new available information on Microblogging and other social media. Text automatic classification becomes imperative in order to help people locate the information he inquires and filter spam. Based on the characteristics of curse of dimensionality and lack of semantic features in Traditional text classification model, put forward a short text classify based on Word2 vec model.Since Word2 vec can not distinguish the weight of words, we applied weights using tfidf weighting with Word2 vec, implemented weighted Word2 vec. Then we concatenated tf-idf with our word2 vec weighted by tf-idf. Our results show that the combination of Word2 vec weighted by tf-idf without stop words and tf-idf without stop words can outperform either Word2 vec weighted by tf-idf without stop words and tf-idf with or without stop word.
出处 《信息网络安全》 CSCD 2017年第1期57-62,共6页 Netinfo Security
基金 国防保密通信重点实验室基金[9140C110401140C11053]
关键词 短文本分类 Word2vec TFIDF 支持向量机 short text classification Word2vec TFIDF SVM
  • 相关文献

参考文献15

二级参考文献150

共引文献855

同被引文献409

引证文献51

二级引证文献334

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部