摘要
为了从来源不同的威胁情报中提取关键信息,方便政府监管部门开展安全风险评估,针对威胁情报文本中英文混杂严重以及专业词汇生僻导致识别困难的问题,在BiGRU-CRF模型基础上,提出了一种融合边界特征以及迭代膨胀卷积神经网络(IDCNN)的威胁情报命名实体识别方法.该方法根据人工构造的规则词典将边界清晰的实体例如英文单词进行转化以减少模型在处理较长文本时容易造成的信息损失,通过IDCNN和双向门控循环单元(BiGRU)进一步提取了文本的局部和全局特征.通过在威胁情报语料库上进行实验,结果表明所提的方法模型在相关评价指标上均优于其他模型,F值达到87.4%.
In order to extract key information of threat intelligence from different sources and facilitate the government regulatory authorities to carry out security risk assessment,to reduce the difficulty identification caused by the serious mixing of Chinese and English threat intelligence texts and the lack of professional vocabulary,based on BiGRU-CRF model,a threat intelligence named entity recognition(NER)method integrating boundary features and iterated dilated convolution neural network(IDCNN)is proposed.Firstly,entities with clear boundaries,such as English words,are transformed according to the artificially constructed rule dictionary to reduce the loss of information easily caused by the model when processing long texts.The local feature information and the context global feature information are obtained through IDCNN and bidirectional gated recurrent unit(BiGRU),respectively.The results of experiments on threat intelligence corpus show that the proposed model is better than other models in relevant evaluation indexes,and the F-score reaches 87.4%.
作者
王瀛
王泽浩
李红
黄文军
WANG Ying;WANG Ze-hao;LI Hong;HUANG Wen-jun(Henan International Joint Laboratory of Theories and Key Technologies on Intelligence Networks,Henan University,Kaifeng 475001,China;Subject Innovation and Intelligence Introduction Base of Henan Higher Educational Institution-Intelligent Information Processing Innovation and Intelligence Introduction Base of Henan University Software Engineering,Henan University,Kaifeng 475001,China;Institute of Intelligence Networks System,Henan University,Kaifeng 475001,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100049,China)
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2023年第1期33-39,共7页
Journal of Northeastern University(Natural Science)
基金
河南省自然科学基金资助项目(182300410164)
河南大学研究生教育创新与质量提升计划项目——英才计划(No.SYL19060120)
国家自然科学基金青年基金资助项目(61702503,61802016)
国家自然科学基金重点资助项目(Y810021104).
关键词
威胁情报
膨胀卷积
命名实体识别
信息抽取
深度学习
threat intelligence
dilated convolution
named entity recognition(NER)
information extraction
deep learning