摘要
在目前的多标签文本分类任务中,主要面临以下2个问题:(1)侧重文本表示学习,对标签之间的关联信息建模不充分;(2)尽管使用了标签关联信息来改善多标签分类任务,但对标签关联的建模过于依赖人工预定义的外部知识,而外部知识的获取成本高昂,限制了其实际应用。针对以上问题,提出了一种面向多标签文本分类的标签关联学习模块Corrective-Net。该模块可以在不依赖外部知识的前提下,自动学习数据中的标签关联信息;同时,它还可以利用标签关联信息,对基础分类模块的初始预测结果进行修正,使得最终预测兼顾语义信息和标签关联信息,以获得更精准的多标签预测结果。在AAPD和SO数据集上的大量实验表明,Corrective-Net具有通用性和有效性,通过分析标签修正对各个标签性能的影响,得到了显式的标签关联信息,并进行了可视化。
In the current multi-label text classification tasks,the following two problems are mainly faced:(1)Emphasis is placed on the learning of text representation,and the modeling of the association information between labels is insufficient;(2)Although label association information is used to improve multi-label classification tasks,its modeling of label association relies too much on manually predefined external knowledge,and the acquisition cost of external knowledge is high,which limits its practical application.To solve the above problems,this paper proposes a label association learning module for multi-label text classification,called Corrective-Net.The module can automatically learn label association information in data without relying on external knowledge.At the same time,it can also use label association information to modify the initial prediction of the basic classification module,so that the final prediction takes into account semantic information and label association information,so as to obtain more accurate multi-label prediction.A large number of experiments on AAPD and SO data sets show the universality and effectiveness of Corrective Net.The effects of corrective label corrections on the performance of each label are analyzed.Explicit label association information is obtained and visualized.
作者
肖新正
黄瑞章
陈艳平
秦永彬
宋玉梅
周裕林
XIAO Xin-zheng;HUANG Rui-zhang;CHEN Yan-ping;QIN Yong-bin;SONG Yu-mei;ZHOU Yu-lin(Text Computing&Cognitive Intelligence Engineering Research Center of National Education Ministry,Guiyang 550025;State Key Laboratory of Public Big Data,Guiyang 550025;College of Computer Science&Technology,Guizhou University,Guiyang 550025,China)
出处
《计算机工程与科学》
CSCD
北大核心
2024年第6期1092-1100,共9页
Computer Engineering & Science
基金
国家自然科学基金(62066007,62066008)
贵州省教育厅高等学校科学研究(青年项目)(黔教技〔2022〕149号)。
关键词
标签关联
标签修正
多标签
文本分类
可视化
label association
label correction
multi-label
text classification
visualization