期刊文献+

基于词性标注和规则相结合的信息抽取方法 被引量:5

Information Extraction Method Based on Combination of Part-of-speech Tagging and Rules
下载PDF
导出
摘要 实现对企业日常经营活动文本高效、准确的结构化信息抽取,推动企业数字化建设。抽取结果信息事关企业业务经营,希望模型在满足绝对精确率的基础上召回率越高越好,但现有研究方法不能满足企业实际应用。因此,提出一种基于词性标注和规则相结合的信息抽取方法,对文本分别采用基于词性标注和基于规则的信息抽取策略对文本进行信息抽取并得到抽取结果,再对结果信息进行合规判断和冲突避免,最后引入人工识别。使用石油企业设备日常监控文本数据2029条,对每条文本中10个不同数据值进行信息抽取实验并得到结构化输出结果。精确率P达到100%,召回率R达到99.87%,相比于单一信息抽取方法具有更好的效果,能有效满足企业实际应用需求。该方法已实际应用在本企业业务与档案管理系统项目中,极大地提高了企业业务管理工作效率,取得了良好的应用效果。 The efficient and accurate extraction of structured information from the text of daily business activities of enterprises can be achieved to promote the digital construction of enterprises.The extracted result information is related to the business operation of enterprises.It is hoped that the higher the recall rate of the model is on the basis of meeting the absolute accuracy rate,the better.However,the existing research methods cannot meet the practical application of enterprises.Therefore,an information extraction method based on the combination of part-of-speech tagging and rules is proposed.The text is extracted by using the information extraction strategies based on part-of speech tagging and rules respectively and the extraction results are obtained.Then the result information is judged by compliance and conflict avoidance.The 2029 pieces of daily monitoring text data of petroleum equipment are used to extract information from 10 different data values in each text and obtain structured output results.The accuracy rate P reaches 100%and recall rate R reaches 99.87%.Compared with the single information extraction method,it has better effects and can effectively meet the practical application requirements of enterprises.The proposed method has been applied to the enterprise business and file management system project,greatly improving the efficiency of enterprise business management,achieving good application results.
作者 张伟 潘兴明 张海波 何霄 薄佳男 秦小龙 ZHANG Wei;PAN Xing-ming;ZHANG Hai-bo;HE Xiao;BO Jia-nan;QIN Xiao-long(Beijing Petroleum Machinery Co.,LTD.,China Petroleum Engineering Technology Research Institute,Beijing 102206,China;School of Information,Renmin University of China,Beijing 100872,China)
出处 《计算机技术与发展》 2021年第10期215-220,共6页 Computer Technology and Development
基金 中国石油集团公司课题(2018E-2107) 中国石油集团工程技术研究院有限公司课题(CPETQ202003)。
关键词 信息抽取 自然语言处理 规则匹配 词性标注 企业 数字化 information extraction natural language processing rule matching part of speech tagging enterprise digitalization
  • 相关文献

参考文献10

二级参考文献101

共引文献139

同被引文献59

引证文献5

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部