期刊文献+

中文维基百科的实体分类研究 被引量:1

Classifying Named Entities on Chinese Wikipedia
下载PDF
导出
摘要 维基百科实体分类对自然语言处理和机器学习具有重要的作用。该文采用机器学习的方法对中文维基百科的条目进行实体分类,在利用维基百科页面中半结构化信息和无结构化文本作为基本特征的基础上,结合中文的特点使用扩展特征和语义特征来提高实体分类性能。在人工标注的语料库上的实验表明,这些额外特征有效地提高了ACE分类体系上的实体分类性能,总体F1值达到96%,同时在扩展实体分类上也取得了较好的效果,总体F1值达95%。 Classifying Wikipedia Entities is of great significance to NLP and machine learning. This paper presents a machine learning based method to classify the Chinese Wikipedia articles. Besides using semi-structured data and non-structured text as basic features, we also extend to use Chinese-oriented features and semantic features in order to improve the classification performance. The experimental results on a manually tagged corpus show that the addi- tional features significantly boost the entity classification performance with the overall Fl-measure as high as 96 % on the ACE entity type hierarchy and 95% on the extended entity type hierarchy.
出处 《中文信息学报》 CSCD 北大核心 2015年第5期91-97,124,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金(61373096 90920004) 江苏省高校自然科学研究重大项目(11KJA520003)
关键词 维基百科 实体分类 半结构化信息 信息框 Wikipedia named entities classification semi-structured data Infobox
  • 相关文献

参考文献13

  • 1Nothman J, Curran J R, Murphy T. Transforming Wikipedia into named entity training data[C]//Proceedings of the Australian Language Technology Workshop. 2008: 124-132.
  • 2Nothman J. Learning named entity recognition from Wikipedia[D]. The University of Sydney Australia 7, 2008.
  • 3Bunescu R C, Pasca M. Using Encyclopedic Knowledge for Named entity Disambiguation[C]//Proceedings of the EACL. 2006, 6: 9-16.
  • 4Zirn C, Nastase V, Strube M. Distinguishing between instances and classes in the wikipedia taxonomy[M]. Springer Berlin Heidelberg, 2008.
  • 5Toral A, Munoz R. A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia[J]. NEW TEXT Wikis and blogs and other dynamic text sources, 2006, 56.
  • 6Bhole A, Fortuna B, Grobelnik M, et al. Extracting named entities and relating them over time based on wikipedia[J]. Informatica (Slovenia), 2007, 31(4): 463-468.
  • 7Tardif S, Curran J R, Murphy T. Improved text categorisation for Wikipedia named entities[C]//Proceedings of the Australasian Language Technology Association Workshop 2009. 2009: 104.
  • 8Dakka W, Cucerzan S. Augmenting Wikipedia with Named Entity Tags[C]//Proceedings of the IJCNLP. 2008: 545-552.
  • 9谌志群,高飞,曾智军.基于中文维基百科的词语相关度计算[J].情报学报,2012,31(12):1265-1270. 被引量:12
  • 10张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报,2012,26(2):75-81. 被引量:23

二级参考文献25

  • 1刘群 李素建.基于《知网》的词汇语义相似度的计算.中文计算语言学,2002,17(2):59-76.
  • 2O. Medelyan, D. Milne, C. Legg, et al. Mining Meaning from Wikipedia[J].International Journal of Human-Computer Studies,September 2009,67 (9):716-754.
  • 3E.Agichtein,L.Gravano.Snowball:Extracting Relations from Large Plain-Text Collections[C]//Proceedings of the fifth ACM conference on Digital libraries.New York,NY,USA:ACM,2000:85-94.
  • 4M.Ruiz-Casado,E.Alfonseca,P.Castells.Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia[J].Natural Language Processing and Information Systems 2005,3513:233-242.
  • 5Y.Yan,N.Okazaki,Y.Matsuo,et al.Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web[C]//Proceeding of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP:Volume 2-Volume 2.
  • 6P. Pantel,M. Pennacchiotti. Espresso:Leveraging Generic Patterns for Automatically Harvesting Semantic Relations[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics,2006:113-120.
  • 7F. M. Suchanek,G. Ifrim,G. Weikum. LEILA:Learning to Extract Information by Linguistic Analysis[J].ACL,2006:18-25.
  • 8G.Wang,Y.Yu,H.Zhu.PORE:Positive-Only Relation Extraction from Wikipedia Text.Lecture Notes in Computer Science[C]//Proceedings of Lecture Notes in Computer Science,2007,Volume 4825:580-594.
  • 9Kilgarriff,J.Rosenzweig.English SENSEVAL:Report an Results.[C]//Proceedings of the 2nd International Conference on Language Resourcesand Evaluation,LREC,Athens,Greece.2000.
  • 10J.X.Chen,D. H.Ji,C.L.Tan,et al.Unsupervised Feature Selection for Relation Extraction[C]//IJCNLP,2005.

共引文献33

同被引文献3

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部