摘要
航天科技开源情报文本内容较长且含有大量专有名词,影响了情报分类的效果,为了提升相关情报的分类准确率,提出一种基于BERT与XGBoost融合模型的航天科技开源情报分类算法。首先通过BERT模型的深度结构提取情报中的关键特征,然后利用XGBoost模型取代BERT最后的输出层,并依据BERT提取到的关键特征对相关情报进行分类。为了验证算法的有效性,设计了与TextRCNN、DPCNN等部分主流语言模型的对比实验,实验结果表明该算法在航天科技开源情报分类中的准确率与TextRCNN、DPCNN模型相比,分别提高了1.9%、2.2%,证明了该算法在相关分类任务中的有效性。
The aerospace science and technology open source intelligence text was relatively long and contained a large number of proper nouns.The effect of intelligence classification was affected by the related characteristics.In order to improve the classification accuracy of related intelligence,a space science and technology open source intelligence classification algorithm based on the BERT and XGBoost fusion model was proposed.Firstly,the key features in the intelligence were extracted through the deep structure of the BERT model,and the XGBoost model was used to replace the final output layer of the BERT,and the relevant intelligence was classified according to the key features extracted by the BERT.In order to verify the effectiveness of the algorithm,comparative experiments with some mainstream language models such as TextRCNN and DPCNN were designed.The results showed that the accuracy of the algorithm in the classification of aerospace science and technology open source intelligence had increased by 1.9%and 2.2%respectively,which proved that the algorithm was in Effectiveness in related classification tasks.
作者
刘秀磊
孔凡芃
谌彤童
刘旭红
LIU Xiulei;KONG Fanpeng;CHEN Tongtong;LIU Xuhong(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100192,China;Laboratory of Data Science and Information Studies,Beijing Information Science and Technology University,Beijing 100192,China;Beijing Institute of Tracking and Communication Technology,Beijing 100192,China)
出处
《郑州大学学报(理学版)》
北大核心
2021年第3期15-22,共8页
Journal of Zhengzhou University:Natural Science Edition
基金
国家重点研发计划项目(2018YFC0830202)
北京市自然科学基金项目(4204100)
北京市教育委员会科技计划一般项目(KM202111232003)
北京信息科技大学“勤信人才”培育计划。
关键词
文本分类
BERT模型
XGBoost模型
航天科技
开源情报
text classification
BERT model
XGBoost model
aerospace science and technology
open source intelligence