期刊文献+

基于主题模型的中文Distant Supervision噪声标注识别方法

Chinese Distant Supervision Noise Mark Identification Based on the Topic Model
下载PDF
导出
摘要 针对Distant Supervision关系抽取方法训练语料存在大量噪声的问题,提出一种基于主题模型的噪声标注识别方法。该方法首先分析了中文Distant Supervision实体关系抽取方法面临的关系句子实例结构复杂的问题,然后利用自定义的模式以及模式聚类实现模式表示与聚合,最后使用主题模型识别噪声标注。实验结果表明,文章方法能有效识别噪声标注,用滤除噪声标注后的数据训练实体关系抽取模型,实验证明经过噪声滤除后实体关系抽取性能得到显著改善。 In view of lots of noise in training corpus for relation extraction based on Distant Supervi- sion method, this paper proposes a method based on the topic model to identify noise mark. This method first analyzes the complex structures of relation sentence examples facec by Distant Supervi- sion relation extraction in Chinese language, and then uses a pattern delimited and pattern clustering to realize pattern representation and polymerization, and last uses the topic model to realize the iden- tification of noise mark. The experimental results show that this method can identify noise mark ef- fectively, and when the data which has been filtered is used to train a relation extraction model, the result could be significantly improved.
机构地区 信息工程大学
出处 《信息工程大学学报》 2016年第3期303-308,共6页 Journal of Information Engineering University
基金 国家863计划资助项目(2011AA7032030D)
关键词 DISTANT SUPERVISION 关系抽取 噪声标注识别 主题模型 关系模式 distant supervision relation extraction noise mark identification the topic model rela- tion pattern
  • 相关文献

参考文献9

  • 1Mike Mintz, Steven Bills, Rion Snow, et al. Distant Su-pervision for Relation Extraction without Labeled Data [C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFN- LP. 2009: 1003-1011.
  • 2Raphael Hoffmann, Congle Zhang, Daniel S. Learning 5000 Relational Extractors [ C ]//Proceedings of the 48th Annual Meeting of the Association for Computational Lin- guistics. 2010: 286-295.
  • 3Limin Yao, Sebastian Riedel, Andrew McCallum. Col- lective Cross-Document Relation Extraction Without La- belled Data[ C ]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 1013-1023.
  • 4Sebastian Riedel, Limin Yao, Andrew McCallum. Mod- eling Relations and Their Mentions without Labeled Text [ C]//Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases. 2010 : 148-163.
  • 5Shingo Takamatsu, Issei Sato, Hiroshi Nakagawa. Re- ducing Wrong Labels in Distant Supervision for Relation Extraction[ C ]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012: 721-729.
  • 6Benjamin Roth, Dietrich Klakow. Combining Generative and Discriminative Model Scores for Distant Supervision [C]//Preceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 24-29.
  • 7张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报,2012,26(2):75-81. 被引量:23
  • 8Aria Haghighi, Lucy Vanderwende. Exploring Content Models for Multi-Document Summarization[ C ]//Proceed- ings of The 2009 Annual Conference of the North Ameri- can Chapter of the ACL. 2009 : 362-370.
  • 9Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, et al. Pattern Learning for Relation Extraetion with a Hierar- ehieal Topic Model[ C ]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012:54-59.

二级参考文献15

  • 1O. Medelyan, D. Milne, C. Legg, et al. Mining Meaning from Wikipedia[J].International Journal of Human-Computer Studies,September 2009,67 (9):716-754.
  • 2E.Agichtein,L.Gravano.Snowball:Extracting Relations from Large Plain-Text Collections[C]//Proceedings of the fifth ACM conference on Digital libraries.New York,NY,USA:ACM,2000:85-94.
  • 3M.Ruiz-Casado,E.Alfonseca,P.Castells.Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia[J].Natural Language Processing and Information Systems 2005,3513:233-242.
  • 4Y.Yan,N.Okazaki,Y.Matsuo,et al.Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web[C]//Proceeding of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP:Volume 2-Volume 2.
  • 5P. Pantel,M. Pennacchiotti. Espresso:Leveraging Generic Patterns for Automatically Harvesting Semantic Relations[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics,2006:113-120.
  • 6F. M. Suchanek,G. Ifrim,G. Weikum. LEILA:Learning to Extract Information by Linguistic Analysis[J].ACL,2006:18-25.
  • 7G.Wang,Y.Yu,H.Zhu.PORE:Positive-Only Relation Extraction from Wikipedia Text.Lecture Notes in Computer Science[C]//Proceedings of Lecture Notes in Computer Science,2007,Volume 4825:580-594.
  • 8Kilgarriff,J.Rosenzweig.English SENSEVAL:Report an Results.[C]//Proceedings of the 2nd International Conference on Language Resourcesand Evaluation,LREC,Athens,Greece.2000.
  • 9J.X.Chen,D. H.Ji,C.L.Tan,et al.Unsupervised Feature Selection for Relation Extraction[C]//IJCNLP,2005.
  • 10F.M.Suchanek,G.Kasneci,G.Weikum.YAGO:A Core of Semantic Knowledge Unifying WordNet and Wikipedia[J]. Proceeding WWW '07 Proceedings of the 16th international conference on World Wide Web,2007:697-706.

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部