摘要
提出一种基于BERT模型的航天科技开源情报分类算法。通过双向Transformer机制捕捉航天科技开源情报中句子间的关系;采用Multi-head Self-attention机制关注文本中的大量专有名词;结合Softmax分类器对提取的特征进行分类。与TextRCNN、DPCNN等主流语言模型相比,该算法在测试集上的准确率分别提升1.7%、3.33%,验证了该算法在航天科技开源情报分类上的有效性。
A classification algorithm of open source intelligence for aerospace science and technology is proposed based on the BERT model.Using the two-way Transformer mechanism,the relationship between sentences in the aerospace science and technology open source intelligence is captured;the Multi-head Self-attention mechanism is adopted to focus on a large number of proper nouns in the text;the Softmax classifier is combined to classify the extracted features.Compared with mainstream language models such as TextRCNN and DPCNN,the accuracy of the algorithm on the test set is increased by 1.7%and 3.33%respectively,which verifies the effectiveness of the algorithm in the classification of aerospace science and technology open source intelligence.
作者
孔凡芃
刘旭红
刘秀磊
李晗
KONG Fanpeng;LIU Xuhong;LIU Xiulei;LI Han(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100192,China;Laboratory of Data Science and Information Studies,Beijing Information Science&Technology University,Beijing 100192,China;Opening Foundation of State Key Laboratory of Digital Publishing Technology,Peking University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2021年第1期28-33,共6页
Journal of Beijing Information Science and Technology University
基金
国家重点研发计划项目(2018YFC0830202)
北京市自然科学基金资助项目(4204100)
北大方正集团有限公司数字出版技术国家重点实验室专项课题
面向边缘计算的创新科研平台建设项目(2020KYNH105)
北京信息科技大学“勤信人才”培育计划。
关键词
文本分类
BERT模型
航天科技
开源情报
text classification
BERT model
aerospace science and technology
open source intelligence