摘要
针对多标签文本分类任务中的标签语义表示,提出了一种双通道标签语义增强模型。该模型包含2个重要的组成模块:基于标签共现的图卷积网络模块和基于预训练的标签语义嵌入模块。前者利用图卷积网络捕获标签之间的语义关联,增强每个标签的语义信息;后者利用预训练模型中的先验知识,增强标签的语义表示。最后,利用注意力机制融合并深度编码来自双通道的标签语义信息。在2个公开数据集AAPD、RCV1-V2上的多标签文本分类实验结果表明:与主流基线方法相比,该方法的精确率、召回率和微F1(Micro-F1)均有显著提升。
A two-channel label semantic enhancement model was proposed for label semantic representation in multi-label text classification tasks.The model comprised two key components:the graph convolutional network module based on label co-occurrence and the label semantic embedding module based on pre-training.The former leveraged graph convolutional network to capture semantic associations among labels,thereby enhancing the semantic information of each label.The latter utilized prior knowledge from pre-trained models to augment the semantic representation of labels.Finally,an attention mechanism was employed to fuse and deeply encode label semantic information from the dual channels.The experimental results of multi-label text classification on two public datasets,AAPD and RCV1-V2,indicate that compared with mainstream baseline methods,our framework demonstrates significant improvements in terms of precision,recall,and micro-F1.
作者
冯心昊
吕学强
马登豪
滕尚志
田晶晶
FENG Xinhao;L Xueqiang;MA Denghao;TENG Shangzhi;TIAN Jingjing(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 102206,China;China National Institute of Standardization,Beijing 100012,China)
出处
《北京信息科技大学学报(自然科学版)》
2024年第4期49-54,共6页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金项目(62171043)
国家语委项目(ZDI145-10)
中国标准化研究院院长基金项目(282022Y-9461)。
关键词
多标签文本分类
标签语义嵌入
预训练语言模型
图卷积网络
multi-label text classification
label semantic embedding
pre-trained language model
graph convolutional network