期刊文献+

基于BERT模型的网站敏感信息识别及其变体还原技术研究

Research on website sensitive information identification and variant restoration technology based on BERT model
下载PDF
导出
摘要 针对各类网站为了避免被检测到敏感信息,网站内的文字常采用变体词对敏感词词库进行规避。为解决这一问题,文中提出一种基于BERT模型结合变体字还原算法的网站敏感信息识别的方法。该方法将针对文本中的变体词进行还原,通过采用BERT模型对文本内容进行向量化,并将其输入由Bi LSTM层和CNN层构成的模型进行训练,从而实现对网站内敏感信息及其变体词的识别。实验结果显示,变体词还原的正确率较高,通过BERT模型获取的文本向量在文本分类任务中表现出色。与其他模型相比,BERT-Bi LSTM-CNN模型在网站敏感信息识别任务中表现出更高的准确率、召回率和F1值,呈现明显的提升。文中模型为变体词还原问题和敏感信息识别领域提供了参考和支持,具有一定的实际应用价值。 In view of the rapid development of the network and the decreasing cost of website establishment,to avoid detection of sensitive information,variant words are frequently utilized within texts of various types of websites,so that the sensitive word databases can be evaded.Therefore,this study proposes a method for identifying website sensitive information based on a BERT(bidirectional encoder representation from transformers)model combined with a variant word restoration algorithm.In this method,the variant words within the texts are restored,the text content are vectorized by the BERT model and then inputted into a model composed of BiLSTM(bi⁃directional long short⁃term memory)layer and CNN(convolutional neural network)layer for training,so as to achieve the identification of sensitive information and its variant words within websites.Experimental results demonstrate a high accuracy in variant word restoration,and the text vectors obtained by the BERT model exhibit excellent performance in the tasks of text classification.In comparison with the other models,the BERT⁃BiLSTM⁃CNN model demonstrates higher accuracy rate,recall rate,and F1 score in the task of identifying sensitive information on websites,which indicates a significant improvement.The proposed model provides reference and support for variant word restoration and the field of sensitive information identification,possessing a certain practical application value.
作者 符泽凡 姚竟发 滕桂法 FU Zefan;YAO Jingfa;TENG Guifa(College of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;Software Engineering Department,Hebei Software Institute,Baoding 071000,China;Hebei College Intelligent Interconnection Equipment and Multi-modal Big Data Application Technology Research and Development Center,Baoding 071000,China;Hebei Digital Agriculture Industry Technology Research Institute,Shijiazhuang 050021,China;Hebei Key Laboratory of Agricultural Big Data,Baoding 071001,China)
出处 《现代电子技术》 北大核心 2024年第23期105-112,共8页 Modern Electronics Technique
关键词 网站 敏感信息 变体词 BERT 双向长短期记忆网络 卷积神经 website sensitive information variant word BERT BiLSTM CNN
  • 相关文献

参考文献16

二级参考文献164

  • 1张敏.基于文本挖掘的电商评论情感分析[J].产业与科技论坛,2020,0(2):63-64. 被引量:7
  • 2殷志平.构造缩略语的方法和原则[J].语言教学与研究,1999(2):73-82. 被引量:46
  • 3IRI网络舆情指数体系介绍[EB/OL].[2010-06-25].http://www, iricn, com/index, php? option = com_content&view = article&id = 44&hemid =4.
  • 4Eirinaki M, Vazirgiannis M. Web mining for personalization [ J ]. ACM Transactions on Intemet Technology,2003,3 (1) : 12 -13.
  • 5Andrea E. Automatic generation of lexical resources for opinion mining : Models, algorithms and applications [ D ]. Pisa: University dipisa, 2008.
  • 6Martens D, Bruynseels L, Baesens B, et al. Predicting going concern opinion with data mining [ J ]. Decision Support Systems, 2008,45(4) :756 -777.
  • 7Gao Hui, Jiang Jun, She Li, et al. A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework [ J ]. International Journal of Digital Content Technology and Its Applications, 2010,4 ( 3 ) : 95 - 100.
  • 8Goyal A, Bonchi F, Lakshmanan L V. Discovering leaders from community actions[ C ]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. New York: ACM, 2008:499 - 508.
  • 9Zhou Hengmin, D. Zeng, Zhang Changli. Finding leaders from opinion networks [ C ]//Proceedings of 2009 IEEE International Conference on Intelligence and Security Informatics, 2009: 266 - 268.
  • 10李钝,曹元大,万月亮.信息安全中的变形关键词的识别[J].计算机工程,2007,33(21):155-156. 被引量:9

共引文献255

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部