期刊文献+

基于并行卷积网络信息融合的层级多标签文本分类算法 被引量:3

Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion
下载PDF
导出
摘要 自然语言处理是人工智能与机器学习领域的重要方向,它的目标是利用计算机技术来分析、理解和处理自然语言。自然语言处理的一个重点研究方向是从文本内容中获取信息,并且按照一定的标签体系或标准将文本内容进行自动分类标记。相比于单一标签文本分类而言,多标签文本分类具有一条数据属于多个标签的特点,使得更难从文本信息中获得多类别的数据特征。层级多标签文本分类又是其中的一个特别的类别,它将文本中的信息对应划分到不同的类别标签体系中,各个类别标签体系又具有互相依赖的层级关系。因此,如何利用其内部标签体系中的层级关系更准确地将文本分类到对应的标签中,也就成了解决问题的关键。为此,提出了一种基于并行卷积网络信息融合的层级多标签文本分类算法。首先,该算法利用BERT模型对文本信息进行词嵌入,接着利用自注意力机制增强文本信息的语义特征,然后利用不同卷积核对文本数据特征进行抽取。通过使用阈值控制树形结构建立上下位的节点间关系,更有效地利用了文本的多方位语义信息实现层级多标签文本分类任务。在公开数据集Kanshan-Cup和CI企业信息数据集上的结果表明,该算法在宏准确率、宏召回率与微F1值3种评价指标上均优于主流的TextCNN,TextRNN,FastText等对比模型,具有较好的层级多标签文本分类效果。 Natural language processing(NLP)is an important research direction in the field of artificial intelligence and machine learning,which aims to use computer technology to analyze,understand,and process natural language.One of the main research areas in NLP is to obtain information from textual content and automatically classify and label textual content based on a certain labeling system or standard.Compared to single-label text classification,multi-label text classification has the characteristic that a data element belongs to multiple labels,which makes it more difficult to obtain multiple categories of data features from textual information.Hierarchical classification of multi-label texts isa special category,whichdivides the information contained in the text into different category labeling systems,and each category labeling system has an interdependent hierarchical relationship.Therefore,the use of the hierarchical relationship in the internal labeling system to more accurately classify the text into corresponding labels becomes the key to solving the problem.To this end,this paper proposes a hierarchical classification algorithm for multi-label texts based on the fusion of parallel convolutional network information.First,the algorithm uses the BERT model for word integration in textual information,then it enhances the semantic features of textual information using a self-attention mechanism and extracts the features of textual data using different convolutional kernels.The multi-faceted semantic information of the text is more effectively used for the task of a hierarchical classification of multi-label texts by using a threshold-controlled tree structure to establish inter-node relationships between higher and lower bits.The results obtained on the Kanshan-Cup public dataset and the CI enterprise information dataset show that the algorithm outperforms TextCNN,TextRNN,FastTex and other comparative models in three evaluation measures,namely macro-precision,macro-recall,and micro F1 value,and has a better cascade multi-label text classification effect.
作者 易流 耿新宇 白静 YI Liu;GENG Xinyu;BAI Jing(School of Computer Science,Southwest Petroleum University,Chengdu 610000,China)
出处 《计算机科学》 CSCD 北大核心 2023年第9期278-286,共9页 Computer Science
基金 四川省科技计划项目(2022NSFSC0555)。
关键词 层级多标签文本分类 预训练模型 注意力机制 卷积神经网络 树形结构 Hierarchical multi-label text classification Pre-training model Attention mechanism Convolutional neural network Tree structure
  • 相关文献

参考文献2

二级参考文献10

共引文献74

同被引文献17

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部