期刊文献+

主动学习与自学习的中文命名实体识别 被引量:14

Chinese named entity recognition combined active learning with self-training
下载PDF
导出
摘要 命名实体识别是信息抽取中的一项基础性任务,如何利用丰富的未标注语料来提高实体识别的指标是该领域一个重要的研究方向。基于条件随机场提出一种将主动学习与自学习相结合的方法——SACRF,通过设置置信度函数和2-Gram频度阈值来选取样本,并采用人工与自动相结合的方式进行标注来扩展训练语料。实验表明,该方法在提高实体识别的精确率和召回率的同时,能够显著地降低人工标注的工作量。 Named Entity Recognition (NER) is a basic task in information extraction, and it is an important research direction in this domain to use the abundant unlabeled corpus to improve the performance of NER system. An approach combining self-training with active learning based on CRF (SACRF) is proposed. It selected samples by setting the threshold of confidence and 2-Grain frequency, and expanded the training set by annotating the unlabeled corpus manually and automatically. The experiments revealed that this approach can not only improve the precision and recall of NER system, but also reduce the manually annotation efforts greatly.
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2014年第4期82-88,共7页 Journal of National University of Defense Technology
基金 国家高技术研究发展计划主题项目(2011AA120300) 湖南省自然科学基金资助项目(11JJ4028)
关键词 主动学习 自学习 条件随机场 命名实体识别 active learning self-training conditional random fields named entity recognition
  • 相关文献

参考文献13

  • 1ShahshahaniB,LandgrebeD.Theeffectofunlabeledsamplesinreducingthesmallsamplesizeproblem andmitigatingthehughesphenomenon[J].IEEETransactionsonGeoscienceandRemoteSensing,1994,32(5):1087-1095.
  • 2Chapelle O, SchlkopfB, Zieneds A. Semisupervisedlearning[M].Cambridge:TheMITPress,2006.
  • 3Blum A,MitchellT.Combininglabeledandunlabeleddatawithcotraining[C]//ProceedingsoftheEleventhAnnualConferenceonComputationalLearningTheory.New York,NY:ACM,1998:92-100.
  • 4PiseNN,KulkarniP.Asurveyofsemisupervisedlearningmethods[C]//InternationalConference on ComputationalIntelligenceandSecurity.Washington,DC:IEEEComputerSociety,2008:30-34.
  • 5LiM, ZhouZ H.Improvecomputeraideddiagnosiswithmachinelearningtechniquesusingundiagnosedsamples[J].IEEETransactionsonSystems,ManandCybernetics,2007,37(6):1088-1098.
  • 6ZhangML,ZhouZH.CoTRADE:confidentcoTrainingwithdataediting[J].IEEE TransactionsonSystemsManandCyberneticsPartB(Cybernetics),2007,7(3):753-760.
  • 7BlumA,ChawlaS.Learningfromlabeledandunlabeleddatausinggraphmincuts[C]//Proceedingsofthe18thInternationalConferenceonMachineLearning.SanFrancisco,CA:MorganKaufmannPublishers,2001:19-26.
  • 8ZhuX,GhahramaniZ,LaffertyJ.Semisupervisedlearningusinggaussianfieldsandharmonicfunctions[C]//Proceedingsofthe20thInternationalConferenceonMachineLearning,Washington,DC,2003:912-919.
  • 9EngelsonSP,DaganI.Minimizingmanualannotationcostinsupervisedtrainingfromcorpora[C]//Proceedingsofthe34thAnnual Meeting of the Association for ComputationalLinguistics.Stroudsburg,PA:AssociationforComputationalLinguistics,1996:319-326.
  • 10NgaiG,YarowskyD.Rulewritingorannotation:Costefficientresourceusageforbasenounphrasechunking[C]//Proceedingsofthe38thAnnualMeetingoftheAssociationforComputationalLinguistics.Stroudsburg,PA:AssociationforComputationalLinguistics,2000:117-125.

同被引文献97

引证文献14

二级引证文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部