期刊文献+

基于改进循环池化网络的核电装备质量文本分类模型

Classification model of nuclear power equipment quality text based on improved recurrent pooling network
下载PDF
导出
摘要 核电装备质量文本描述了核电装备在设计、采购、施工和调试阶段出现的质量缺陷等问题。由于不同阶段质量事件的发生频率不同,且同一装备对应不同阶段的质量文本中存在相同的关键词和相似的表述形式,针对类型数量不均衡和语义描述耦合的质量文本分类问题,提出一种融合正则反馈焦点损失函数的改进循环池化网络分类模型。首先,采用BERT(Bidirectional Encoder Representation from Transformers)将核电装备质量文本转化为词向量;然后,提出一个改进的3层循环池化网络的分类模型结构,通过增加中间层并选择合适权重,扩大参数训练的提取空间,提升表征质量缺陷语义特征的能力;接着,提出正则反馈焦点损失函数来训练提出分类模型的参数,通过正则项使损失函数的梯度变化更稳定,根据反馈项对损失函数进行基于真实值和预测值之间误差的迭代调整,解决了不均衡样本在训练过程中梯度偏向不均衡的问题;最后,通过归一化指数函数计算出核电装备质量事件对应的阶段。在某核电公司真实数据集和公共数据集上,与Fast_Text网络相比,所提模型的F1值分别提高了2个百分点和1个百分点,实验结果表明该模型在文本分类任务中具有较高的准确性。 The quality text of nuclear power equipment describes the quality defects and other issues that occur during the design,procurement,construction,and commissioning stages of nuclear power equipment.Due to the different frequencies of quality events occurring at different stages,and the existence of the same keywords and similar expressions in quality texts corresponding to the same equipment at different stages,an improved recurrent pooling network classification model was proposed by integrating regularization and feedback for focus loss function to address the quality text classification problems with imbalanced number of categories and semantic description coupling.Firstly,BERT(Bidirectional Encoder Representation from Transformers)was used to convert nuclear power equipment quality text into word vectors.Then,an improved three-layer recurrent pooling network classification model structure was proposed,which expanded the extraction space for parameter training by adding intermediate layers and selecting appropriate weights,and enhanced the ability to represent semantic features of quality defects.Next,regularization and feedback for focus loss function was proposed to train the parameters of the proposed classification model.To solve the problem of uneven gradient bias of imbalanced samples during the training process,the regularization term was used to make the gradient change of the loss function more stable,and the feedback term was used to iteratively adjust the loss function based on the error between the true value and the predicted value.Finally,the corresponding stages of nuclear power equipment quality events were calculated using a normalized exponential function.On the real dataset of a certain nuclear power company and a public dataset,F1 value of this model was 2 percentage points and 1 percentage point respectively higher than that of Fast_Text network.The experimental results show that the proposed model has high accuracy in text classification tasks.
作者 陆潜慧 张羽 王梦灵 吴庭伟 单玉忠 LU Qianhui;ZHANG Yu;WANG Mengling;WU Tingwei;SHAN Yuzhong(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;China Nuclear Power Design Company Limited,Shenzhen Guangdong 518000,China)
出处 《计算机应用》 CSCD 北大核心 2024年第7期2034-2040,共7页 journal of Computer Applications
基金 国家重点研发计划项目(2020YFB1711700)。
关键词 改进循环池化网络 焦点损失 核电装备质量文本 质量事件分类 自然语言处理 improved recurrent pooling network focal loss nuclear power equipment quality text quality event classification natural language processing
  • 相关文献

参考文献3

二级参考文献19

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部