期刊文献+

结合主动学习的威胁情报IOC识别方法 被引量:1

ICAL:A Threat Intelligence IOC Identification Method Combined with Active Learning
下载PDF
导出
摘要 威胁指标(IOC)作为网络威胁的特征描述,是识别和防御网络攻击的重要凭证。当前IOC识别主要依赖于神经网络模型,其效果取决于标注数据的数量。然而,目前IOC识别领域缺乏公认的数据集,且IOC的标注只能由安全专家手动完成,标注成本高,难以获取大量已标注数据。针对该问题,提出了一种结合主动学习的威胁情报IOC识别方法ICAL。该方法首先基于样本的代表性选择初始样本进行人工标注,然后基于聚类假设对聚类样本进行伪标注,最后基于样本的不确定性继续迭代标注,直到满足终止条件。使用CNNPLUS作为分类模型,在自构建的威胁情报数据集上进行实验。结果表明,相比于传统IOC自动识别策略,ICAL的识别准确率达到94.2%、召回率达到94.1%,同时减少了58%的人工标注工作量,具有较高的实用价值。 Indicators of compromise(IOC),as behavioral descriptions of cyber threats,are important credentials for identifying and defending against cyberattacks.The current IOC recognition mainly adopts the deep neural network training model,and its effect depends on a large amount of training data.However,there is currently a lack of recognized datasets in the field of IOC recognition.IOC can only be manually labeled by security experts,the labeling cost is high,and it is difficult to obtain a large amount of labeling data.To solve this problem,we propose a threat intelligence IOC identification method with active learning,called ICAL(IOC identification combined with active learning).The method first selects the initial samples for manual labeling according to the representativeness of the samples;then it pseudo-labels the clustered samples according to the clustering hypothesis;finally,it continues to iteratively label the samples according to the uncertainty of the samples until the termination conditions are satisfied.Using CNNPLUS as the classification model,experiments are performed on the self-built threat intelligence dataset.The results show that ICAL reduces the labeling workload by nearly 58%compared with the traditional IOC automatic identification strategies,and the recognition accuracy rate reaches 94.2%.ICAL reduces the amount of data labeling in IOC identification with strong practicability.
作者 罗琴 杨根 刘智 唐宾徽 LUO Qin;YANG Gen;LIU Zhi;TANG Binhui(School of Computer Science,Southwest Petroleum University Chengdu 610500;School of Cyberspace Security,Sichuan University Chengdu 610044)
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2023年第1期108-115,共8页 Journal of University of Electronic Science and Technology of China
基金 国家自然科学基金(61902328) 四川省重点研发计划(2022YFG0323)。
关键词 主动学习 密度聚类 威胁指标 威胁情报 active learning density clustering indicators of compromise threat intelligence
  • 相关文献

参考文献6

二级参考文献27

  • 1徐杰,施鹏飞.图像检索中基于标记与未标记样本的主动学习算法[J].上海交通大学学报,2004,38(12):2068-2072. 被引量:7
  • 2张翔,肖小玲,徐光祐.基于最大熵估计的支持向量机概率建模[J].控制与决策,2006,21(7):767-770. 被引量:12
  • 3赵悦,穆志纯,董洁,付冬梅,何伟.基于QBC主动学习方法建立电信客户信用风险等级评估模型[J].北京科技大学学报,2007,29(4):442-446. 被引量:2
  • 4Cohn D A, Ghahramani Z, Jordan M I. Active learning with sta- tistical madels[J]. Journal of Artificial Intelligence Research, 1996, 4:129-145.
  • 5Roy N, McCallum A K. Toward optimal active learning through sampling estimation of error reductionEC]//Proc, of the 18th International Conference on Machine Learning, 2001 : 441 ~ 448.
  • 6Lewis D D,Gale W. A sequential algorithm for training text clas- sifiers[C]//Proc, of the 17th Annual International ACM S IG IR Conference on Research and Development in Information Re trieval,1994:3 - 12.
  • 7Seung H S, Opper M, Sompolinsky H. Query by committee. EC~// Proc. of the 15th Annual ACM Workshop on Computational Learn- ing Theory, 1992 ~287 - 294.
  • 8Freund Y, Seung H S, Samir E, et al. Selective sampling using the query by committee algorithm[J]. Machine Learning, 1997,28 (2/3) :133 - 168.
  • 9Lin C F, Wang S D. Fuzzy support vector machine[J]. IEEE Trans. on Neural Networks,2001,13(2):464 471.
  • 10Panda N, Goh K S, Chang E Y. databasesFJ]. Multimedia Tools 2006, 31(3):249-267.

共引文献99

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部