Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields 被引量：9

导出

摘要 Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threats,and so on.Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications.However,most Named Entity Recognition(NER)models are suitable only for general fields,and there has been little research focusing on cybersecurity entity extraction in the security domain.To this end,in this paper,we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields(Bi-LSTM with CRF)to extract security-related concepts and entities from unstructured text.This model,which we have named XBi LSTM-CRF,consists of a word-embedding layer,a bidirectional LSTM layer,and a CRF layer,and concatenates X input with bidirectional LSTM output.Via extensive experiments on an open-source dataset containing an office security bulletin,security blogs,and the Common Vulnerabilities and Exposures list,we demonstrate that XBi LSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.

作者 Pingchuan Ma Bo Jiang Zhigang Lu Ning Li Zhengwei Jiang

机构地区 School of Cyber Security Institute of Information Engineering

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第3期259-265,共7页 清华大学学报（自然科学版（英文版）

基金 supported by the National Natural Science Foundation of China(Nos.61702508,61802404,and U1836209) the National Key Research and Development Program of China(Nos.2018YFB0803602 and 2016QY06X1204) the National Social Science Foundation of China(No.19BSH022) supported by the Key Laboratory of Network Assessment Technology,Chinese Academy of Sciences Beijing Key Laboratory of Network Security and Protection Technology。

关键词 security blogs Long Short-Term Memory(LSTM) Named Entity Recognition(NER)

分类号 TP391.1 [自动化与计算机技术—计算机应用技术] TP393.08 [自动化与计算机技术—计算机应用技术] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献71

1刘新亮,张梦琪,谷情,任延昭,何东彬,高万林.基于BERT-CRF模型的生鲜蛋供应链命名实体识别[J].农业机械学报,2021,52(S01):519-525. 被引量：8
2胡文博,都云程,吕学强,施水才.基于多层条件随机场的中文命名实体识别[J].计算机工程与应用,2009,45(1):163-165. 被引量：25
3韩普,姜杰.HMM在自然语言处理领域中的应用研究[J].计算机技术与发展,2010,20(2):245-248. 被引量：15
4加羊吉,李亚超,宗成庆,于洪志.最大熵和条件随机场模型相融合的藏文人名识别[J].中文信息学报,2014,28(1):107-112. 被引量：20
5叶辉,卓奕荣,肖志毅,吴瑞源.基于Word2vec与BP神经网络的病历症状自动分类研究[J].医学信息学杂志,2018,39(11):59-62. 被引量：1
6华却才让,姜文斌,赵海兴,刘群.基于感知机模型藏文命名实体识别[J].计算机工程与应用,2014,50(15):172-176. 被引量：25
7杨锦锋,于秋滨,关毅,蒋志鹏.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报,2014,40(8):1537-1562. 被引量：121
8康才畯,龙从军,江荻.基于条件随机场的藏文人名识别研究[J].计算机工程与应用,2015,51(3):109-111. 被引量：9
9张焕国,韩文报,来学嘉,林东岱,马建峰,李建华.网络空间安全综述[J].中国科学：信息科学,2016,46(2):125-164. 被引量：146
10刘峤,李杨,段宏,刘瑶,秦志光.知识图谱构建技术综述[J].计算机研究与发展,2016,53(3):582-600. 被引量：883

引证文献9

1Jie Man,Honghui Dong,Limin Jia,Yong Qin.GGC:Gray-Granger Causality Method for Sensor Correlation Network Structure Mining on High-Speed Train[J].Tsinghua Science and Technology,2022,27(1):207-222.
2钟爱,梁小青,肖梅,向黎藜,段凯,李竹.基于正则算法和命名实体识别模型的95598工单结构化信息自动提取[J].电力大数据,2021,24(12):38-45. 被引量：2
3陈雨,玄宇航,张玉志.基于深度学习和指代消解的中文人名识别[J].数据与计算发展前沿,2022,4(2):63-73. 被引量：2
4邓凯,杨频,李益洲,杨星,曾凡瑞,张振毓.一种可快速迁移的领域知识图谱构建方法[J].计算机科学,2022,49(S01):100-108. 被引量：2
5张大波,郭怀新,储著伟,王博欣.基于多分类BiLSTM-CRF的电网启动方案结构化数据转换模型研究[J].电力信息与通信技术,2023,21(1):54-61. 被引量：2
6张猛.基于医疗BERT的电子病历命名实体识别[J].信息技术与信息化,2023(2):122-125.
7于韬,张英,拥措.基于小样本学习的藏文命名实体识别[J].计算机与现代化,2023(5):13-19. 被引量：2
8孙伟,李一,马永强.基于自然语言处理技术的知识图谱构造方法研究[J].集宁师范学院学报,2023,45(5):94-97.
9杨赛,刘昕,于绍文.面向采购文件的跨模态图片文本命名实体识别[J].计算机工程与应用,2024,60(3):213-219.

二级引证文献10

1张玉志.专刊序言[J].数据与计算发展前沿,2022,4(2):1-2.
2黄源航,强梦烨,李涛,晏明昊,张涵艺,贾大昌.基于RoBERTa的电力领域词汇挖掘模型[J].电力大数据,2022,25(6):1-8. 被引量：1
3邱云飞,邢浩然,李刚.矿井建设知识图谱构建研究综述[J].计算机工程与应用,2023,59(7):64-79. 被引量：3
4林夏莹,连海峰,黄思洁.基于知识图谱的银行客户数据管理研究[J].金融科技时代,2023,31(8):10-15.
5曾旻冬,李宁,李红仁,张仰超,呼树尧,张坤,马吉伟.基于知识图谱的燃气轮机故障诊断知识库构建方法及维护[J].电力大数据,2023,26(4):44-55. 被引量：1
6秦沛聪,潘威华,石宝源,钟健,刘鑫.基于深度学习的智能产品说明AI客服设计[J].信息记录材料,2023,24(8):104-107. 被引量：1
7孟敏,史志英.基于大数据及物联网的数据库半结构化数据识别方法[J].信息与电脑,2023,35(22):193-195.
8黄政豪,金光洙.面向命名实体自动抽取的朝鲜语人名结构研究[J].东疆学刊,2024,41(1):116-120.
9德吉措,安见才让.面向藏文文本的人物关系抽取语料库的构建[J].青海科技,2024,31(1):81-86.
10韩一搏,董立红,叶鸥.基于联合编码的煤矿综采设备知识图谱构建[J].工矿自动化,2024,50(4):84-93.

1Shuai Ren,Jinglong Niu,Maolin Cai,Liming Hao,Yan Shi,Weiqing Xu,Zujin Luo.Novel assisted cough system based on simulating cough airflow dynamics[J].Bio-Design and Manufacturing,2021,4(3):479-489.
2Call for papers[J].Journal of Materiomics,2021,7(5).
3陈金定,张河清.基于全域旅游视角下的博罗乡村旅游转型升级路径分析[J].旅游纵览,2021(4):76-78. 被引量：2
4Robert Bergquist,Maxine Whittaker.Control of neglected tropical diseases in Asia Pacific: implications for health information priorities[J].Infectious Diseases of Poverty,2012,1(1):15-18. 被引量：6
5Pukar Khanal,B.M.Patil,Jagdish Chand,Yasmin Naaz.Anthraquinone Derivatives as an Immune Booster and their Therapeutic Option Against COVID-19[J].Natural Products and Bioprospecting,2020,10(5):325-335.
6Mingquan LI.Reform the Supply Side to Downplay the Hog Cycle[J].Asian Agricultural Research,2021,13(3):24-25. 被引量：1
7Liang Liu,Hao Lu,Yanan Li,Zhiguo Cao.High-Throughput Rice Density Estimation from Transplantation to Tillering Stages Using Deep Networks[J].Plant Phenomics,2020,2(1):255-268. 被引量：2
8Danlei Zhu,Bo Wang,Hengrui Ma,Hongxia Wang.Evaluating the Vulnerability of Integrated Electricity-heat-gas Systems Based on the High-dimensional Random Matrix Theory[J].CSEE Journal of Power and Energy Systems,2020,6(4):878-889. 被引量：2
9Matt K.Broadhurst.Improving penaeid-trawl efficiencies via ground gear with tickler chains[J].Aquaculture and Fisheries,2021,6(3):321-329.
10Eric Yoon,Arooj Babar,Moaz Choudhary,Matthew Kutner,Nikolaos Pyrsopoulos.Acetaminophen-Induced Hepatotoxicity:a Comprehensive Update[J].Journal of Clinical and Translational Hepatology,2016,4(2):131-142. 被引量：17

Tsinghua Science and Technology

2021年第3期

浏览历史

内容加载中请稍等...

Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields 被引量：9

同被引文献71

引证文献9

二级引证文献10

相关作者

相关机构

相关主题

浏览历史