期刊文献+

基于条件随机场的农作物病虫害及农药命名实体识别 被引量:29

Recognition of Crops,Diseases and Pesticides Named Entities in Chinese Based on Conditional Random Fields
下载PDF
导出
摘要 互联网农技问答平台现仅依靠人工提供答题服务,响应速度慢,回答质量难以保证。实现智能农技问题解答,构建农技知识库,需要从现有问答数据提取"农作物-病虫害-农药"命名实体三元组。现有对农业中文命名实体识别的研究较少,且准确率较低。根据农作物、病虫害及农药命名实体的特点,针对农技问答数据,提出基于条件随机场的农作物、病虫害及农药命名实体的识别方法。对数据集进行格式整理及自动分词,并对分词后的语料,针对是否包含特定界定词、是否含特定偏旁部首、是否是数量词、是否是特定左右指界词及词性等特征进行自动标注。利用标注后的数据训练CRF模型,可以对语料进行分类,包括判断语料是否属于农作物、病虫害、农药3类命名实体并识别该语料在复合命名实体中的位置,从而实现了对3类命名实体的识别,由此可自动构建关联三元组。通过试验选择特征组合和调整上下文窗口大小,提高了本方法的识别准确度,降低了模型训练时间,对农作物、病虫害、农药命名实体识别的准确度分别达97.72%、87.63%、98.05%,比现有方法有显著提高。 On internet agricultural technology platform,thousands of new questions are waiting to be answered by experts every day.It is generally doubted because of slowly response time and uncertain quality of the manual services.An intelligent response system based on agricultural technology knowledge base can help to answer some questions automatically.To build the knowledge base,it is necessary to recognize triples of "crop-disease-pesticide"named entities from mass of existing questions and answers data.However,fewer studies are reported on recognition methods for named entities of diseases and pesticides in Chinese,and accuracies of those for named entities of crops are low.Thus,a recognition method based on conditional random fields(CRF) was proposed,which recognized crops,diseases,and pesticides named entities from agricultural technology questions and answers data.In the method,question and answer texts was formatted and split to pieces of corpus.Each corpus piece was automatically annotated with several features,including whether it contained characteristic Chinese characters and characteristic radicals,whether it was numeral,whether it was the left or right bound of a compound word,and part of speech.A CRF model was trained with these annotated texts to classify pieces of corpus,including judging whether they were parts of crop,disease,or pesticide named entities and recognizing positions in named entities.With the trained model,three types of named entities could be accurately recognized and triples could be associated automatically.Recognition accuracies and time cost of model training were optimized by choosing input feature combinations and adjusting sizes of context windows in experiments.Accuracies of recognizing crops,diseases,and pesticides of this method were97.72%,87.63% and 98.05% respectively,which were significantly higher than existing methods.
出处 《农业机械学报》 EI CAS CSCD 北大核心 2017年第S1期178-185,共8页 Transactions of the Chinese Society for Agricultural Machinery
基金 国家自然科学基金项目(61502500) 北京市自然科学基金项目(4164090) 中央高校基本科研业务费专项资金项目(2017QC077)
关键词 病虫害 农药 知识库 命名实体识别 条件随机场 disease pesticide knowledge base named entities recognition conditional random fields
  • 相关文献

参考文献12

二级参考文献96

共引文献132

同被引文献285

引证文献29

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部