期刊文献+

基于信息熵的半监督领域实体关系抽取研究 被引量:3

A semi-supervised learning method based on information entropy to extract the domain entity relation
原文传递
导出
摘要 针对监督机器学习方法抽取实体关系受限于标注语料的规模问题,提出采用信息熵方法来不断扩展小规模训练数据的半监督领域实体关系抽取。结合领域词汇选取小规模训练数据,构建了一定准确率的初始最大熵分类器,用来从未标记数据中预测出候选新实例。采用信息熵方法,通过设定不同熵值,多次循环以选取可信度较高的新实例来扩展训练数据。使用扩展后的训练数据重新迭代训练分类器,分类器性能趋于稳定迭代终止,实现了半监督学习的领域实体关系抽取。实验表明,和已有方法相比,本文提出的半监督领域实体关系抽取通过结合信息熵方法,在小规模标注样本环境中取得了较好的学习效果。 To solve the limitation by the scale of labeled corpus of the supervised learning method,a semi-supervised method based on information entropy was proposed to extract entity relation using small-scale training data.First,combined with field vocabulary to select small-scale training data,an initial maximum entropy classifier of certain accuracy was constructed to predict some new candidate instances from unlabeled data.Second,the method of information entropy was applied by setting different entropy value and cycling many times,and some new instances of the higher credibility from candidate instances were selected to expand the training data.Finally,the training classifier was re-iteratived with the expanded training data until classifier performance tended to a stable iteration termination,which achieved field entity relation extraction.Experimental results showed that the semi-supervised learning method based on information entropy achieved better learning results compared to other methods.
出处 《山东大学学报(工学版)》 CAS 北大核心 2011年第4期7-12,共6页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金项目(60863011) 云南省自然科学基金重点项目资助项目(2008CC023) 云南省中青年学术技术带头人后备人才项目资助项目(2007PY01-11)
关键词 信息熵 半监督 最大熵分类器 未标记 可信度 information entropy semi-supervised the maximum entropy classifier unlabeled credibility
  • 相关文献

参考文献17

二级参考文献174

共引文献580

同被引文献40

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:116
  • 2姜维,关毅,王晓龙.基于条件随机域的词性标注模型[J].计算机工程与应用,2006,42(21):13-16. 被引量:12
  • 3曹占广,马亚平.基于关键字语法规则的军用文书识别系统设计[J].计算机仿真,2006,23(11):10-14. 被引量:3
  • 4德范克.ABC汉英大词典[M].上海:汉语大词典出版社,2003.
  • 5GREENE B B, RUBIN G M. Automatic grammatical tag- ging of English [ R ]. Providence, R I: Department of Lin- guistics, Brown University, 1971.
  • 6ERIC BRILL. Some advances in transformation-based part of speech tagging [ C ]//Proceedings of the Twelfth National Conference on Artificial Intelligence. California: AAAI Press, 1994:722-727.
  • 7中国科学院计算技术研究所.汉语词法分析系统ICT—CLAS[DB/OL].[2011-03-20].http://www.duanxin—hui.com/soft/html/3861.html.2010.
  • 8美国麻省理工学院媒体实验室.英语自动词法分析器montylingua-2.1(python版)[DB/OL].[2011-03-20].
  • 9南京师范大学语言科技实验中心.1000句对汉英句珠对齐语料库(法律新闻和信息情报,82篇)[DB/DK].南京:南京师范大学语言科技实验中心,2006.
  • 10Mitchell P Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini. Building a large annotated corpus of English: the Penn Treebank [J].Computational Linguistics, 1993, 19(2) :313-330.

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部