摘要
针对传统敏感信息识别方法忽略了上下文语境和关键词词性而导致的漏报、误报问题,提出一种改进文本敏感信息识别的方法STEAP。构建暴恐敏感词典;通过敏感触发事件的抽取构建敏感触发事件序列,结合敏感触发事件及关键词的词性为待识别的信息分配权重;将构建的触发事件与词向量、暴恐敏感词典进行相似度的计算,结合权重获得文本的敏感度。实验结果证明,与传统敏感信息识别方法相比,STEAP方法能够有效识别出文本中的敏感信息,并且在精确度上得到了一定提高。
Aiming at the problem of false negatives and false positives caused by the context of contextual context and keyword part-of-speech,this paper proposes a method(STEAP)to improve the recognition of text-sensitive information.It constructs a terrorism sensitive dictionary.Through sensitive triggering,the extraction of events constructs a sequence of sensitive trigger events,and combines the sensitive trigger events and the part of speech of the keywords to assign weights to the information to be identified.It calculates the similarity of the constructed trigger event with the word vector and the terrorism sensitive dictionary,and combines the weights to obtain the sensitivity of the text.Experimental results show that compared with the traditional sensitive information recognition method,the STEAP method can effectively identify the sensitive information in the text,and the accuracy is improved.
作者
刘聪
王永利
周子韬
犹锋
张才俊
LIU Cong;WANG Yongli;ZHOU Zitao;YOU Feng;ZHANG Caijun(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;Nari Group Corporation/State Grid Electric Power Research Institute Co.,Ltd.,Jiangsu Ruizhong Data Co.,Ltd.,Nanjing 210094,China;Grid Customer Service Center,Nanjing 210094,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第20期132-137,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61170035,No.61272420,No.81674099,No.61502233)
中央高校基本科研业务费专项资金项目(No.30916011328,No.30918015103)
南京市科技计划项目(No.201805036)
“十三五”装备领域基金(No.61403120501)
中国工程院2019年度咨询研究项目(No.2019-ZD-1-02-02)。
关键词
敏感触发事件
词性序列
敏感信息识别
文本相似度
sensitive trigger events
part of speech sequence
sensitive information recognition
text similarity