
基于BERT与XGBoost的航天科技开源情报分类 被引量:8

Research on Classification of Aerospace Science and Technology Open Source Information Based on BERT and Xgboost
摘要 航天科技开源情报文本内容较长且含有大量专有名词,影响了情报分类的效果,为了提升相关情报的分类准确率,提出一种基于BERT与XGBoost融合模型的航天科技开源情报分类算法。首先通过BERT模型的深度结构提取情报中的关键特征,然后利用XGBoost模型取代BERT最后的输出层,并依据BERT提取到的关键特征对相关情报进行分类。为了验证算法的有效性,设计了与TextRCNN、DPCNN等部分主流语言模型的对比实验,实验结果表明该算法在航天科技开源情报分类中的准确率与TextRCNN、DPCNN模型相比,分别提高了1.9%、2.2%,证明了该算法在相关分类任务中的有效性。 The aerospace science and technology open source intelligence text was relatively long and contained a large number of proper nouns.The effect of intelligence classification was affected by the related characteristics.In order to improve the classification accuracy of related intelligence,a space science and technology open source intelligence classification algorithm based on the BERT and XGBoost fusion model was proposed.Firstly,the key features in the intelligence were extracted through the deep structure of the BERT model,and the XGBoost model was used to replace the final output layer of the BERT,and the relevant intelligence was classified according to the key features extracted by the BERT.In order to verify the effectiveness of the algorithm,comparative experiments with some mainstream language models such as TextRCNN and DPCNN were designed.The results showed that the accuracy of the algorithm in the classification of aerospace science and technology open source intelligence had increased by 1.9%and 2.2%respectively,which proved that the algorithm was in Effectiveness in related classification tasks.
作者 刘秀磊 孔凡芃 谌彤童 刘旭红 LIU Xiulei;KONG Fanpeng;CHEN Tongtong;LIU Xuhong(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100192,China;Laboratory of Data Science and Information Studies,Beijing Information Science and Technology University,Beijing 100192,China;Beijing Institute of Tracking and Communication Technology,Beijing 100192,China)
出处 《郑州大学学报(理学版)》 北大核心 2021年第3期15-22,共8页 Journal of Zhengzhou University:Natural Science Edition
基金 国家重点研发计划项目(2018YFC0830202) 北京市自然科学基金项目(4204100) 北京市教育委员会科技计划一般项目(KM202111232003) 北京信息科技大学“勤信人才”培育计划。
关键词 文本分类 BERT模型 XGBoost模型 航天科技 开源情报 text classification BERT model XGBoost model aerospace science and technology open source intelligence
  • 相关文献



  • 1郭峰,徐玉生,陈晓云,王颖.基于信息提取的面向行业应用文本分类算法[J].清华大学学报(自然科学版),2005,45(S1):1810-1813. 被引量:3
  • 2曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 3顾榕,王小平,曹立明.一种基于潜在语义分析的查询扩展算法[J].计算机工程与应用,2004,40(18):23-25. 被引量:8
  • 4高洁,吉根林.文本分类技术研究[J].计算机应用研究,2004,21(7):28-30. 被引量:36
  • 5黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:250
  • 6DIETTERICH T G,LATHROP R H,LOZANO-PEREZ T.Solving the multiple-instance problem with axis-parallel rectangles[J].Artificial intelligence,1997,89(1/2):31-71.
  • 7HONG R,WANG M,GAO Y,et al.Image annotation by multiple-instance learning with discriminative feature mapping and selection[J].IEEE transactions on cybernetics,2014,44(5):669-680.
  • 8XIE Y,QU Y,LI C,et al.Online multiple instance gradient features selection for robust visual tracking[J].Pattern recognition letters,2012,33(9):1075-1082.
  • 9ZEISL B,LEISTNER C,SAFFARI A,et al.On-line semi-supervised multiple-instance boosting[C]//IEEE Conference on Computer Vision and Pattern Recognition.San Francisco,2010:1879-1894.
  • 10CHEVALEYRE Y,ZUCKER J D.Solving multiple-instance and multiple-part learning problems with decision trees and rules sets:application to the mutagenesis problem[M]//Lecture Notes in Computer Science.Berlin:Springer,2000:204-214.












使用帮助 返回顶部