摘要
数值信息是文本中的一种重要信息,含有主体、属性、属性值等元素。然而当前数值信息抽取方法对比较关系的表示比较单一,对于含有多个数值的句子,其数值信息抽取的效果不佳。根据文本中数量关系表述特征,提出一种数值信息表示方法和数值信息抽取框架。根据数值信息中各个元素的特点,利用BI-LSTM-CRF模型对数值信息元素进行识别;利用语言特征判断属性值和其他元素之间的语义关系。该方法对数值信息抽取的准确率、召回率和F值分别达到0.775、0.752和0.763,优于现有的抽取的方法。
Numerical information is an important piece of information in a text,containing subject,attribute,and attribute values.However,the current numerical information extraction has a relatively simple representation of the comparison relationship.For a sentence containing multiple values,the extraction effect of numerical information is not good.According to the quantitative relationship in the text,we proposed a numerical information representation method and a numerical information extraction framework.According to the characteristics of each element in the numerical information,BI-LSTM-CRF model was used to identify the elements of the numerical information.Then we used language features to judge the semantic relationship between attribute values and other elements.The accuracy,recall and F value of the method are 0.775,0.752 and 0.763 respectively,which are better than the existing methods.
作者
王竣平
白宇
蔡东风
Wang Junping;Bai Yu;Cai Dongfeng(Human-Computer Intelligence Research,Shenyang Aerospace University,Shenyang 110136,Liaoning,China;Knowledge Engineering and Human-Computer Intelligence Research Center,Shenyang 110136,Liaoning,China)
出处
《计算机应用与软件》
北大核心
2019年第5期138-144,共7页
Computer Applications and Software
基金
教育部人文社会科学研究项目(17YJCZH003)
辽宁省自然科学基金项目(20170540696)
沈阳市科技计划项目(17-231-1-82)