期刊文献+

基于信息熵加权的Word2vec中文文本分类研究 被引量:1

Research on Chinese Text Classification Based on Word2vec
下载PDF
导出
摘要 针对中文文本分类中文本向量表示和词汇重要性等问题,提出基于Word2vec的中文文本分类方法。首先采用Word2vec训练生成文本向量;然后根据信息熵的概念,计算出不同词汇在文档中的重要程度,对向量进行加权;最后使用SVM分类器对加权后的词向量进行训练。实验结果表明,本文提出的分类方法在精确率、召回率和F-measure均有显著提高,具有较好的分类效果。 Aiming at the problems of text vector representation and vocabulary importance in Chinese text classification,this paper proposes a Chinese text classification method based on Word2vec.Firstly,Word2vec is used to generate text vectors.Then,according to the concept of information entropy,the importance of different words in the document is calculated and the vector is weighted.Finally,the weighted word vector is trained by SVM classifier.The experimental results show that the classification method proposed in this paper has a significant improvement in accuracy,recall rate and F-measure,and a good classification effect.
作者 吴萍萍 WU Ping-ping(School of Information and Electronic Engineering,Liming Vocational University,Quanzhou 362000,China)
出处 《长春师范大学学报》 2020年第2期28-33,共6页 Journal of Changchun Normal University
基金 黎明职业大学2017年校级课题研究项目“基于文本分类的网络舆情分析及预测研究”(LZ201711)
关键词 Word2vec 中文文本 信息熵 Word2vec Chinese text the information entropy
  • 相关文献

参考文献4

二级参考文献29

  • 1陈雷,王延章.熵权法对融合网络服务质量效率保障研究[J].计算机工程与应用,2005,41(23):1-3. 被引量:3
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:383
  • 3熊文新,宋柔.信息检索用户查询语句的停用词过滤[J].计算机工程,2007,33(6):195-197. 被引量:16
  • 4MUATA K, BRYSO O. Towards supporting expert evaluation of clustering results using a data mining process model[ J]. Information Sciences, 2010, 180(3) : 414 -431.
  • 5CAO F Y, LIANG J Y, JIANG G. An initialization method for the K- means algorithm using neighborhood model [ J]. Computers and Mathematics with Applications, 2009, 58(3) :474 -483.
  • 6ALIK K R. An efficient K-means clustering algorithm[ J]. Pattern Recognition Letters, 2008, 29(9) : 1385 - 1391.
  • 7REDMOND S J, HENEGHAN C. A method for initialising the K-means clustering algorithm using KD-trees[ J]. Pattern Recognition Letters, 2007, 28(8) : 965 -973.
  • 8LAI J Z C, HUANG T J, LIAW Y C. A fast K-means clustering algorithm using cluster center displacement[ J]. Pattern Recognition, 2009,42(11): 2551 -2556.
  • 9HAN J W, KAMBER M. Data mining concepts and techniques [ M]. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006: 383 -461.
  • 10UCI Machine Learning Repository [ DB/OL]. [ 2010 - 12 - 20]. http://archive, ics. uci. edu/ml/.

共引文献56

同被引文献15

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部