期刊文献+

基于BERT与生成对抗的民航陆空通话意图挖掘

Intention mining for civil aviation radiotelephony communication based on BERT and generative adversarial
下载PDF
导出
摘要 针对民航陆空通话领域语料难以获取、实体分布不均,以及意图信息提取中实体规范不足且准确率有待提升等问题,为了更好地提取陆空通话意图信息,提出一种融合本体的基于双向转换编码器(bidirectional encoder representations from transformers,BERT)与生成对抗网络(generative adversarial network,GAN)的陆空通话意图信息挖掘方法,并引入航班池信息对提取的部分信息进行校验修正,形成空中交通管制(air traffic control,ATC)系统可理解的结构化信息。首先,使用改进的GAN模型进行陆空通话智能文本生成,可有效进行数据增强,平衡各类实体信息分布并扩充数据集;然后,根据欧洲单一天空空中交通管理项目定义的本体规则进行意图的分类与标注;之后,通过BERT预训练模型生成字向量并解决一词多义问题,利用双向长短时记忆(bidirectional long short-term memory,BiLSTM)网络双向编码提取上下句语义特征,同时将该语义特征送入条件随机场(conditional random field,CRF)模型进行推理预测,学习标签的依赖关系并加以约束,以获取全局最优结果;最后,根据编辑距离(edit distance,ED)算法进行意图信息合理性校验与修正。对比实验结果表明,所提方法的宏平均F_(1)值达到了98.75%,在民航陆空通话数据集上的意图挖掘性能优于其他主流模型,为其加入数字化进程奠定了基础。 In the field of civil aviation radiotelephony communication,there are problems such as difficult access to the corpus,uneven distribution of entities,and insufficient entity specification and accuracy in intention information extraction.In order to better extract the intent information of radiotelephony communication,this paper proposes a ontology fused bidirectional encoder representations from transformers(BERT)based and generative adversarial network(GAN)approach to mining intention information of radiotelephony communication.The extracted information is then partially checked and corrected by introducing the flight pool information to form structured information that can be understood by the air traffic control(ATC)system.Firstly,the improved GAN model for intelligent text generation of radiotelephony communication is used,which can effectively perform data augmentation,balance the information distribution of various entities and expand the dataset.Then,the classification and annotation of intentions are performed according to the ontology rules defined by the European Single Sky Air Traffic Management project.After that,word vectors are generated by the BERT pre-training model and solve the problem of multiple meanings of words.Simutaneously,the bidirectional long short-term memory(BiLSTM)network is used to extract contextual semantic features by bidirectional encoding.Those features are also fed into the conditional random field(CRF)model for inference prediction,learning the dependencies of the labels and constraining them to obtain the global optimal results.Finally,the intention information is verified and checked according to the edit distance(ED)algorithm.The comparative experimental results show that the proposed method achieves a Macro-F 1 value of 98.75%and outperforms other mainstream models in intention mining on civil aviation radiotelephony communication datasets,laying the foundation for its inclusion in the digitization process.
作者 马兰 孟诗君 吴志军 MA Lan;MENG Shijun;WU Zhijun(School of Air Traffic Management,Civil Aviation University of China,Tianjin 300300,China;School of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China;School of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China)
出处 《系统工程与电子技术》 EI CSCD 北大核心 2024年第2期740-750,共11页 Systems Engineering and Electronics
基金 国家自然科学基金(62172418)资助课题。
关键词 民航陆空通话 信息提取 生成对抗网络 本体 双向转换编码器 civil aviation radiotelephony communication information extraction generative adversarial network(GAN) ontology bidirectional encoder representations from transformers(BERT)
  • 相关文献

参考文献5

二级参考文献10

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部