摘要
多标签文本分类问题是多标签分类的一个重要内容,传统的多标签文本分类算法往往只关注文本本身的信息而无法理解深层语义信息,也未考虑标签之间的关系。为了解决这些问题,提出了融合BERT(Bidirectional Encoder Representation from Transformers)-GAT(Graph Attention neTwork)-CorNet(Correlation Network)的多标签文本分类模型。首先,通过预训练模型BERT表示文本的特征向量,并用生成的特征向量建立图结构数据;接着,用GAT来为不同节点分配不同的权重;最后,通过Softmax-CorNet学习标签相关性增强预测并分类。所提模型在今日头条子数据集(TNEWS)和KUAKE-QIC数据集上的准确率分别为93.3%和83.2%,通过对比实验表明,所提模型在多标签文本分类任务上性能得到了有效提升。
Multi-label text classification is an important part of multi-label classification.Traditional multi-label text classification algorithms often only focus on the information of the text itself but cannot understand the deep semantic information,and do not consider the relationship between labels.To address these issues,a multi-label text classification model integrating BERT(Bidirectional Encoder Representation from Transformers)-GAT(Graph Attention neTwork)-CorNet(Correlation Network)was proposed.Firstly,the feature vectors of the text were represented by the pre-trained model BERT,and the generated feature vectors were used to establish graph structure data.At the same time,GAT was used to assign different weights to different nodes.Finally,Softmax CorNet was applied to learn label correlation and then to enhance prediction and classification.The proposed model achieves accuracies of 93.3% and 83.2% on TNEWS and KUAKE-QIC datasets,respectively.Compared with the existing models,the proposed model achieves effective improvements in multi-label text classification tasks.
作者
刘新忠
赵澳庆
谢文武
杨志和
LIU Xinzhong;ZHAO Aoqing;XIE Wenwu;YANG Zhihe(College of Information Science and Engineering,Hunan Institute of Science and Technology,Yueyang Hunan 414000,China)
出处
《计算机应用》
CSCD
北大核心
2023年第S02期18-21,共4页
journal of Computer Applications
基金
湖南省自然科学基金资助项目(2023JJ50045,2023JJ50046)。
关键词
多标签文本分类
预训练模型
图结构数据
标签相关性
BERT
图注意网络
CorNet
multi-label text classification
pre-trained model
graph structure data
label correlation
Bidirectional Encoder Representation from Transformers(BERT)
Graph Attention Network(GAT)
Correlation Networks(CorNet)