摘要
零水印是信息安全研究领域信息内容安全保护的一种重要方法,该方法不修改载体内容,仅通过提取载体特征构造水印,因而具有隐蔽性、安全性等特点.然而,不同领域的文本,在内容表达上存在显著的差异.医疗文本具有大量与医疗术语相关的词语,这些词语能够作为医疗文本零水印构造的特征.但已有的分词工具很难准确的切分医疗术语,使得医疗文本的特点不能被充分利用.针对该问题,提出了一种基于命名实体识别的医疗文本零水印方案.该方案通过BiLSTM和CRF训练医疗文本实体识别模型,获取待保护医疗文本中的相关实体,并依据实体类别标签对实体进行分类.将实体名称、实体顺序、实体长度信息作为医疗文本零水印构造的特征,设计零水印生成算法.将算法输出的零水印发送到第三方认证机构注册保存,并给出水印验证方法.最后,通过模型性能评估实验证明命名实体识别技术能够有效地提取医疗文本实体;通过对文本的格式攻击与实体的添加、删除、替换、及句子移位攻击实验,验证了方案的可行性.
As an important method in information security,zero-watermarking technique does not modify the content of the carrier,and constructs a watermark by extracting the characteristics of the carrier,thus it has the characteristics of concealment,robustness and security.However,texts in different fields have significant differences in their content expression.Medical texts have a large number of words related to medical terms that can be used as features of a medical text zero-watermark construction.However,the existing word segmentation tools cannot accurately segment medical terms,so that the characteristics of medical texts cannot be fully utilized.In order to solve this problem,this paper proposes a medical text zero-watermarking scheme based on named entity recognition.The scheme trains the medical text entity recognition model through the BiLSTM and CRF,acquires related entities in the medical text to be protected,and classifies the entities according to the entity category labels.The name of the entity,the order of the entity,and the length information of the entity are used as the characteristics of the medical text zero-watermark construction.Moreover,a zero-watermark algorithm is designed,and the result of the algorithm is sent to the third-party certification authority for registration and storage,and the watermark verification method is given.Finally,the model performance evaluation experiment shows that the named entity recognition method can effectively extract medical text entities.The feasibility of the scheme is verified by various experiments,including the text formatting attack,entity addition,deletion,replacement,and sentence shift attack.
作者
龚礼春
姚晔
唐观根
吴国华
GONG Li-Chun;YAO Ye;TANG Guan-Gen;WU Guo-Hua(School of Cyberspace Security,Hangzhou Dianzi University,Hangzhou 310018,China;School of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《密码学报》
CSCD
2020年第5期643-654,共12页
Journal of Cryptologic Research
基金
浙江省重点研发计划(2017C01062)
教育部人文社会科学研究基金(17YJC870021)。
关键词
文本零水印
医疗文本
命名实体识别
实体特征
text zero-watermark
medical text
named entity recognition
entity features