摘要
随着自然语言处理技术的发展,人们越来越重视构建面向国防科技领域的知识图谱。而面向国防科技领域的技术和术语识别是构建该领域技术知识图谱的基础。文中基于该领域的语料库,在技术和术语识别的任务上,探索了子词单元在传统序列标注Bi-LSTM+CRF模型上的应用。此外,针对任务的特点,提出了适用于技术和术语识别的语言学特征。基于该领域的语料库,实验结果表明技术和术语识别的F1值达到了71.80%,较基准系统提升了3.04%,能够较好地识别出面向国防科技领域的技术和术语。同时,所提方法也优于基于BERT模型的技术术语识别方法。
With the rapid development of natural language processing,constructing oriented national defense science(ONDS)technology knowledge base has received more and more attention.The identification of technology and terminology is fundamental for constructing ONDS technology knowledge base.To recognize technology and terminology,this paper explored the application of subwords in the traditional Bi-LSTM+CRF sequence labeling model.In addition,this paper proposed linguistic features to boost the performance.Experimental results on the annotated dataset show that the proposed approach achieves 71.8%F 1 scores,with improvement of 3.04%over the baseline system,indicating the effectiveness of the proposed approach in recognizing ONDS technology and terminology.Meanwhile,it also outperforms BERT-driven models in recognizing technology and terminology.
作者
冯鸾鸾
李军辉
李培峰
朱巧明
FENG Luan-luan;LI Jun-hui;LI Pei-feng;ZHU Qiao-ming(School of Computer Sciences and Technology,Soochow University,Suzhou,Jiangsu 215006,China;Provincial Key Laboratory for Computer Information Processing Technology,Suzhou,Jiangsu 215006,China)
出处
《计算机科学》
CSCD
北大核心
2019年第12期231-236,共6页
Computer Science
基金
国家自然基金项目重点项目(61836007),面上项目(61772354,61773276)资助