摘要
针对网络安全领域的事件识别标注数据较为匮乏且场景和语义复杂,难以构建准确的事件识别模型的问题,提出了一种基于提示问答数据增强的小样本网络安全事件检测方法。首先利用提示信息获取事件表示知识,并结合标签词映射网络安全事件类型,从未标注的文本中生成新的数据来扩充训练数据;然后使用生成的高置信度的伪标注实例和原始数据来微调模型,以增强模型对网络安全事件的语义理解能力;最后在2个网络安全领域数据集上进行了实验验证。结果表明,与其他基线方法相比,所提方法在低资源网络安全事件检测任务上具有很强的优越性。
The cybersecurity field lacks sufficient annotated data for event recognition,and the scenarios and semantics are complex,making it difficult to construct accurate event recognition models.A few-shot cybersecurity event detection method by data augmentation with prompting question answering was proposed.Firstly,event representation knowledge was obtained using prompt information and combined with label words to map cybersecurity event types.New data was generated from unlabeled text to expand the training data.Then,the generated high-confidence pseudo-annotated in‐stances and raw data were used to fine-tune the model to enhance its semantic understanding of cybersecurity events.Ex‐perimental verification was conducted on two datasets in cybersecurity.The result showes that the proposed method’s substantial superiority in low-resource network security event detection tasks compared to other baseline methods.
作者
汤萌萌
郭渊博
张晗
白庆春
陈庆礼
张博闻
TANG Mengmeng;GUO Yuanbo;ZHANG Han;BAI Qingchun;CHEN Qingli;ZHANG Bowen(Department of Cryptogram Engineering,Information Engineering University,Zhengzhou 450001,China;School of Cyberspace Security,Hainan University,Haikou 570100,China;School of Cyberspace Security,Zhengzhou University,Zhengzhou 450001,China;Shanghai Open Distance Education Engineering Technology Research Center,Shanghai Open University,Shanghai 200082,China;Zhengzhou Inspur Data Technology Co.,Ltd.,Zhengzhou 450001,China)
出处
《通信学报》
EI
CSCD
北大核心
2024年第8期62-74,共13页
Journal on Communications
基金
国家自然科学基金资助项目(No.62276091,No.62307028)
河南省重大公益专项基金资助项目(No.201300311200)
上海市自然科学基金资助项目(No.23ZR1441800)
上海市启明星项目扬帆专项基金资助项目(No.23YF1426100)。
关键词
网络安全
事件检测
提示问答
数据增强
小样本
cybersecurity
event detection
prompting question answering
data augmentation
few-shot