期刊文献+

基于BERT-BiLSTM-CRF的网络敏感词及变体实体识别 被引量:3

Entity Recognition of Network Sensitive Words and Variants Based on BERT-BiLSTM-CRF
下载PDF
导出
摘要 网页内容安全监测是维护互联网安全的一种重要技术手段。针对网络中存在的大量敏感词及其复杂多样的变体难以检测的问题,论文采用一种基于BERT-BiLSTM-CRF的深度学习网络模型进行敏感词及变体的识别。首先通过BERT层对文本序列向量化,其次将向量化的数据表示输入到BiLSTM层中提取敏感词的丰富特征,最后利用CRF层对输出做进一步约束修正,该模型在标注的敏感词及变体实体识别数据集上训练后能较为准确地识别出实体。实验结果表明,该模型在精准率、召回率和F1值上均优于其他模型,识别效果较好。 Web content security monitoring is an important technical approach to maintain Internet security.Aiming at the problem that it is difficult to detect a large number of sensitive words and their complex variants emerging on Web pages in net-works,this paper proposes a deep learning network model based on BERT-BiLSTM-CRF.Firstly,text sequence is vectorized by the Bert layer.Secondly,the vectorized data representation is input into the BiLSTM layer to extract the rich features of sensitive words.Finally,the output is processed by the CRF layer After training on the labeled sensitive words and variant entity recognition data set,the model can recognize the entity more accurately.The experimental results show that the model is better than other mod-els in accuracy,recall and F1 value,and its recognition rate is fairly accepted.
作者 郑贤茹 李柏岩 冯珍妮 刘晓强 ZHENG Xianru;LI Baiyan;FENG Zhenni;LIU Xiaoqiang(College of Computer Science and Technology,Donghua University,Shanghai 201620)
出处 《计算机与数字工程》 2023年第7期1585-1589,共5页 Computer & Digital Engineering
基金 上海市青年科技英才扬帆计划项目(编号:19YF1402200) 东华大学中央高校基本科研业务费专项资金(编号:2232021D-23)资助。
关键词 敏感词 变体识别 命名实体识别 BERT BiLSTM sensitive words variants recognition named entity recognition BERT BiLSTM
  • 相关文献

参考文献10

二级参考文献105

共引文献148

同被引文献35

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部