摘要
因文本信息泄密导致的危害越来越严重,但传统的泄密检测还停留在人工查看,效率低且易造成二次泄密。针对以上问题,采用文本相似度自动比较和数据加密方法,提出了一种基于自然语言处理的文本泄密自动检测技术。在实际应用中,因检测粒度过粗可能导致漏检,采用基于自然段落和语句的相似度检测方法,方便疑似段落和语句的自动定位,最后设计并实现了一个文本泄密检测系统。实验结果表明,该技术能很好地应用于涉密文本泄密的检测,具有保密、人工干预少、效率高、疑似段落定位等特点。
The damage caused by text divulgence is getting more and more serious while the divulgence detecting remain in the level of manual operation, which is ineffective and easily lead to secondary divulgence. Aimed at the above questions, a auto-detection technology of text divulgence based on natural language processing is proposed by the method of text similarity auto comparison and data eneryption. In practical applications, due to the coarsness of detection, there are the possibility of detection omission. The method of similarity detection based on natural paragraph and sentences is used, which facilitate location of them. Finally, a text divulgence detection system is designed and implemented. The result of the experiment demonstrates that the technology can be used in the detection text divulgence with the feature of privacy, less manual intervention, efficiency, suspected paragraph positioning and so on.
出处
《计算机工程与设计》
CSCD
北大核心
2011年第8期2600-2603,共4页
Computer Engineering and Design
基金
中国博士后科学基金项目(20080431114)
南京信息工程大学校科研基金项目(20070113)
关键词
自然语言处理
文本泄密
加密
相似度检测
信息抽取
natural language processing
text divulgence
encryption
similarity examination
information extraction