期刊文献+

基于多特征的信息安全事件语料标注方法

An Annotation Method for Information Security Corpus Based on Multi-Feature
下载PDF
导出
摘要 针对信息安全领域缺乏语料库的情况,提出一种事件语料标注方法。将新闻文本中句子作为研究对象,在哈工大语言技术LTP平台分析基础上,将词性、句法和语义角色等多种特征融合到条件随机场模型中,对句中分词做标注,得到分词标签后,完善LTP平台的XML形式结果。实验部分不仅和人工标注作对比,同时与只利用常用特征构建特征向量的CRF模型作对比,结果表明,标注的事件要素F1值均超过60%,与未加入句法和语义角色特征相比,F1值有明显提升。 In view of the lack of corpus in the field of information security,proposes an event corpus labeling method.Takes sentences in news texts as research objects,on the basis of LTP platform analysis of language technology of Harbin Institute of Technology,various features such as part-of-speech,syntactic and semantic roles are integrated into the conditional random field model,and word segmentation in sentences is marked.After word segmentation labels are obtained,XML form results of LTP platform are further improved.The experimental part is not only compared with manual labeling,but also compared with CRF model which only uses common features to construct feature vectors.The F1 value of several event elements annotated by multi-feature CRF model exceeds 60%.Compared with the absence of syntactic and semantic role features,F1 value has been significantly improved.
作者 郭婷婷 刘嘉勇 GUO Ting-ting;LIU Jia-yong(College of Electronics and Information Engineering,Sichuan University,Chengdu 610065;College of Cybersecurity,Sichuan University,Chengdu 610065)
出处 《现代计算机》 2019年第5期27-32,共6页 Modern Computer
关键词 事件标注 信息安全 多特征 条件随机场 Event Tagging Information Security Multi-Feature Conditional Random Field
  • 相关文献

参考文献5

二级参考文献61

  • 1张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:66
  • 2吴平博,陈群秀,马亮.基于时空分析的线索性事件的抽取与集成系统研究[J].中文信息学报,2006,20(1):21-28. 被引量:21
  • 3梁晗,陈群秀,吴平博.基于事件框架的信息抽取系统[J].中文信息学报,2006,20(2):40-46. 被引量:38
  • 4俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 5Automatic Content Extraction (ACE) [ EB/OL]. [ 2010-07- 21] http://www, itl. nist. gov/iad/mig/-tests/ace/2005.
  • 6Message Understanding Conference (MUC) [ EB/OL]. [ 2010- 07-21 ]. http: //en. wikipedia, org/wiki/Mes- sage_ Under- standing_ Conference.
  • 7YANGARBER R. Scenario customization for information extrac- tion IDa. New York: New York University, 2001.
  • 8CHIEU H L, NG H T?A maximum entropy approach to infor- mation extraction from semi-structured and free text [ C ] // Proceedings of the 18th National Conference on Artificial Intelli- gence. USA: American Association for Artificial Intelligence, 2002 : 786-791.
  • 9LLORENS H, SAQUETE E, et al. TimeML events recognition and classification learning CRF models with semantic roles [ C ] //Proceedings of the 23rd International Conference on Computational, 2010.
  • 10AHN D. The stages of event extraction [ C ] //Proceedings of the Workshop on Annotations and Reasoning about Time and E- vent. [ s. 1. ]: Association for Computational Linguistics, 2006: 1-8.

共引文献130

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部