期刊文献+

基于字词向量融合的民航智慧监管短文本分类

Short text classification of civil aviation intelligent supervision based on character-word fusion
下载PDF
导出
摘要 为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题。为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡。将字向量和词向量按字融合拼接,得到具有词特征信息的字向量。将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验。结果表明:该模型准确率为0.9837,F 1值为0.9836。与一些字嵌入模型和词嵌入模型相对比,准确率提升0.4%。和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性。 In order to address the inefficiencies in manually classifying and analyzing inspection records about civil aviation supervision,a dual-channel feature extraction short text classification model was proposed.The model combined data augmentation techniques and character-word vector fusion.The model aimed to tackle classification issues related to people,equipment and facilities,institutional procedures and institutional responsibilities in civil aviation supervised matters.In order to tackle the issue of class imbalance,data augmentation algorithms were employed to generate new samples by transforming the original texts,thereby balancing the sample sizes across different categories.The word vectors and character vectors were fused by combining them at the character level,resulting in character vectors that retain word-level features.These fused character vectors were then fed into TextCNN and BiLSTM for feature extraction at different dimensions.By extracting features from both local and global perspectives,this dual-channel approach aimed to capture comprehensive and effective information from the inspection records dataset in civil aviation regulatory matters.Experimental results on the civil aviation regulatory matter inspection record dataset demonstrate that the proposed model achieves an accuracy of 0.9837 and an F 1 score of 0.9836.Compared with some existing word embedding models and character embedding models,the accuracy is improved by 0.4%.Furthermore,when compared with commonly used single-channel models,the accuracy is increased by 3%,which validates the effectiveness and comprehensiveness of the features extracted by the dual-channel model.
作者 王欣 干镞锐 许雅玺 史珂 郑涛 WANG Xin;GAN Zurui;XU Yaxi;SHI Ke;ZHENG Tao(School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China;Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China)
出处 《中国安全科学学报》 CAS CSCD 北大核心 2024年第2期37-44,共8页 China Safety Science Journal
基金 国家自然科学基金资助(U2033213) 中央高校基本科研业务费专项资金资助(J2022-048,J2019-045)。
关键词 字词向量融合 民航监管 短文本 文本卷积神经网络(TextCNN) 双向长短期记忆(BiLSTM) character-word vector fusion civil aviation supervision short text text convolutional neural networks(TextCNN) bi-directional long short-term memory(BiLSTM)
  • 相关文献

参考文献12

二级参考文献75

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部