期刊文献+

基于TF-IDF特征词提取的不平衡文本分类 被引量:1

Unbalanced text classification based on attention mechanism
下载PDF
导出
摘要 在文本分类的过程中,文本类别分布不均衡会导致分类准确率下降。针对这个问题,本文提出了一种基于注意力机制的不平衡文本分类方法。首先,利用TF-IDF对每个类别的特征词进行词特征提取;其次,将提取到的词特征和原有的文本拼接进行注意力权重分配;最后,使用softmax分类。实验在复旦大学开源文本数据集上进行,结果表明本文提出的方法相对于其他对比方法更加稳定,准确率有所提高。 In the process of text classification,the unbalanced distribution of text categories will lead to the decline of classification accuracy.To solve this problem,this paper proposes an unbalanced text classification method based on attention mechanism.First,TF-IDF is used to extract the feature words of each category,and then the extracted feature and the original text are spliced to allocate the attention weight of the feature words.Finally,softmax is used for classification.The experiment is carried out on the open source text data set of Fudan University.The experiment shows that the method proposed in this paper is more stable and the accuracy is improved compared with other comparison methods.
作者 陈欢 王忠震 CHEN Huan;WANG Zhongzhen(School of Electrical and Electronic Engineering,Shanghai University of Engineering and Technology,Shanghai 201620,China)
出处 《智能计算机与应用》 2020年第9期73-76,共4页 Intelligent Computer and Applications
关键词 注意力机制 不平衡文本分类 TF-IDF attention mechanism unbalanced text classification TF-IDF
  • 相关文献

参考文献11

二级参考文献116

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:378
  • 3巩军,刘鲁.一种k-NN文本分类器的改进方法[J].情报学报,2007,26(1):56-59. 被引量:10
  • 4Yang Y,Liu X.A re-examination of text categorization methods[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99).Berkeley:ACM Press, 1999:42-49.
  • 5Vries A D, Marnoulis N, Nes N, et al.Efficient KNN search on vertically decomposed data[C]//Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin.Madison: ACM Press,2002:322-333.
  • 6Chawla N V, Japkowicz N, Kotcz A, et al.Special issue on learning from imbalanced data sets[J].SIGKDD _ xplorations Newsletters, 2004,6( 1 ) : 1-6.
  • 7Dumais S T,Fumas G W,I_andaner T K,et akUsing latent semantic analysis to improve information retrieval[C]//Proceedings of CHI' 88: Conference on Human Factors in Computing.New York: ACM, 1998: 281-285.
  • 8Ghosh A K, Chandhuri P, Murtlay C A.Multiscale classification using nearest neighbor density estimates[J].IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2006,36(5) : 1139-1148.
  • 9WU Xin-dong,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.
  • 10CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.

共引文献364

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部