摘要
在文本分类的过程中,文本类别分布不均衡会导致分类准确率下降。针对这个问题,本文提出了一种基于注意力机制的不平衡文本分类方法。首先,利用TF-IDF对每个类别的特征词进行词特征提取;其次,将提取到的词特征和原有的文本拼接进行注意力权重分配;最后,使用softmax分类。实验在复旦大学开源文本数据集上进行,结果表明本文提出的方法相对于其他对比方法更加稳定,准确率有所提高。
In the process of text classification,the unbalanced distribution of text categories will lead to the decline of classification accuracy.To solve this problem,this paper proposes an unbalanced text classification method based on attention mechanism.First,TF-IDF is used to extract the feature words of each category,and then the extracted feature and the original text are spliced to allocate the attention weight of the feature words.Finally,softmax is used for classification.The experiment is carried out on the open source text data set of Fudan University.The experiment shows that the method proposed in this paper is more stable and the accuracy is improved compared with other comparison methods.
作者
陈欢
王忠震
CHEN Huan;WANG Zhongzhen(School of Electrical and Electronic Engineering,Shanghai University of Engineering and Technology,Shanghai 201620,China)
出处
《智能计算机与应用》
2020年第9期73-76,共4页
Intelligent Computer and Applications