摘要
针对化学资源文本中的命名实体,提出一种适合于化学资源文本的命名实体识别方法,旨在将化学物质、属性、参数、量值4种命名实体进行识别.该方法根据化学资源文本的语言规律及特点,建立BLSTM-CRF模型对命名实体进行初步识别,并使用基于词典与规则相结合的方法对识别结果进行校正.实验结果表明,该方法在化学资源文本中能够较好地完成命名实体识别任务,在测试语料上的F1值最高能达到94.26%.
A method was proposed for the recognition of four kinds of named entities, chemical substances, attributes, parameters, and values in the chemical resource text. The language rules and characteristics of the chemical resource text were used for reference. Firstly, BLSTM-CRF model was established to the recognition of named entity. Then the algorithm, which based on the combination of the dictionary and rule, was used to correct and improve the recognition results. The result of experiments showed that the algorithm was able to complete the named entity recognition task in the chemical resource text well, and the maximum F1-Measure on the test sets could increase to 94.26%.
作者
马建红
王立芹
姚爽
MA Jianhong;WANG Liqin;YAO Shuang(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
出处
《郑州大学学报(理学版)》
CAS
北大核心
2018年第4期14-20,共7页
Journal of Zhengzhou University:Natural Science Edition
基金
中国科学技术咨询服务中心计算机辅助创新设计公共服务平台建设服务采购项目(HSZT2015FD/254)