
基于Web语料的概念获取系统的研究与实现 被引量:6

Concept Extraction and Verification from Web Corpus
摘要 互联网网页中存在大量的专业知识。如何从这些资源中获取知识已经成为10多年来的一个重要的研究课题。概念和概念间的关系是知识的基本组成部分,因此如何获取并验证概念,成为从文本到知识的过程中的重要步骤。本文提出并实现了一种自动从Web语料中获取概念的方法,该方法利用了规则、统计、上下文信息等多种方法和信息。实验结果表明,该方法达到了较好的效果。 There is a large amount of knowledge on the Web pages. How to intelligently acquire knowledge from the massive information on Web pages has become a very important task. Concepts as well as inter-conceptual relations and inter-attribute relations of concepts are the main parts of knowledge. Therefore how toacquire and verify concepts is an important step in the knowledge acquisition. This paper proposes a hybrid approach to automatically extract concepts from large Web corpus. The hybrid approach makes use of rules, statistic, and context information to identify and verify concepts. The experiment shows very good performance of this method for extracting concepts.
作者 余蕾 曹存根
出处 《计算机科学》 CSCD 北大核心 2007年第2期161-165,195,共6页 Computer Science
基金 自然科学基金(#60273019 60573064 60573063和60496326) 国家重点基础研究发展计划(2003CB317008和G1999032701)资助
关键词 中文信息处理 知识获取 概念获取 概念验证 Chinese information processing, Knowledge acquisition, Concept acquisition, Concept verification
  • 相关文献


  • 1Bourigault D. Surface Gramatical Analysis for the Extraction of Terminological Noun Phrases, In; Proceedings of COLING 92. 977-981
  • 2Frantzi T K, Incorporating Context Information for the Extraction of Terms. In:Preceedings of ACLEACL' 97
  • 3Wu shih-hung, Hsu wen -Lian. A semi-automatic domain ontology acquisition tool from Chinese Corpus [C]. In: Proe. of the 19th International Conference on Computational Linguistics(COLING) 2002, Taipei, Taiwan, 2002. 1313 - 1317
  • 4Dunning T. Accurate Methods for the Statistical of Surprise and Coincidence. Association for Computational Linguistics, 1993,19(1): 61-76
  • 5Pantel P, Lin Dekang. A Statistical Corpus-based Term Extractor. In:Canadian Conference on AI, 2001.36-46
  • 6Enguehard C,Pantera L. Automatic Natural Acquistion of a Terminology. Journal of Quantitative Linguistics, 1994,2 (1) : 27-32
  • 7Luo S F, Sun M S. Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In: Proceeding of ACL2003, Sapporo, Japan, 2003
  • 8Riloff E. Automatically constructing a dictionary for information extraction tasks, In:Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, 811-816
  • 9郑家恒 杜永萍 宋礼鹏.农业病虫害词汇获取方法初探[A]..第七届全国计算语言学联合学术会议论文集(JSCL-2003)[C].北京:清华大学出版社,2003..
  • 10罗贝,吴洁,曹存根,邵志清.从文本中获取植物知识方法的研究[J].计算机科学,2005,32(10):6-13. 被引量:13


  • 1张春霞,郝天永.汉语自动分词的研究现状与困难[J].系统仿真学报,2005,17(1):138-143. 被引量:60
  • 2Miller G.WordNet:An On-line Lexical Database.International Journal of Lexicography,1990,3(4)
  • 3Beeferman D.Lexical discovery with an enriched semantic network.In:Proceedings of the Workshop on Applications of Word-Net in Natural Language Processing Systems,ACL/COLING,1998
  • 4Richardson S D,Dolan W B,Vandervende L.Mindnet:acquiring and structuring semantic information from text.In:Proc.of COL-ING-ACL'98,1998.1098~1102
  • 5Cao Cungen,Shi Qiuyan.Acquiring Chinese Historical Knowledge from Encyclopedic Texts.In:Proceedings of the International Conference for Young Computer Scientists,2001.1194~1198
  • 6Dolan W,Vanderwende L,Richardson S D.Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries.In:Proceedings of the Pacific Association for Computational Linguistics.Vancouver,British Columbia,1993.5~14
  • 7Shinzato K,Torisawa K.Acquiring hyponymy relations from web documents.In:Proceedings of HLT-NAACL 2004.73~80
  • 8宋柔 许勇.基于语义的百科辞典知识提取实验[J].Computational Linguistics and Chinese Language Processing,2002,7(2):101-112.
  • 9Hearst M A.Automatic acquisition of hyponyms from large text corpora.In:Proceedings of the 14th International Conference on Computational Linguistics.Nantes,France,1992
  • 10Hearst M A.Automated Discovery of WordNet Relations.To Appear in WordNet:An Electronic Lexical Database and Some of its Applications,Christiane Fellbaum (Ed.).MIT Press,1998












使用帮助 返回顶部