
基于Web弱指导的本体概念实例及属性的同步提取 被引量:4

Weakly-Supervised Extraction of Ontology Concept Instances and Concept Attributes from the Web
摘要 该文提出了一种基于Web弱指导的本体概念实例和属性的同步提取方法,利用小规模的种子实例和属性集,该文从Web上自动获取实例和属性共现的上下文模式,并利用种子实例和属性的关联性来评价这些模式。进一步,根据上下文模式提取候选概念实例和属性后,该文提出两种方法来评价提取的候选实例和属性。第一,利用概念实例和属性的关联性来互相评价对方的准确度;第二,利用候选实例或候选属性与种子实例或属性在上下文模式分布上的相似度来评价准确度。在疾病类实验结果表明,人工确认候选实例的准确率在前500个结果达到94%,前1 000个结果的准确率也高达93%。 In this paper, we propose a weakly-supervised method of extracting Ontology concept instances and attributes from the Web. We automatically acquire the co-occurrence patterns of the concept instances and attributes from the Web, and we evaluate these patterns based on the assumption that concept instances are relevant to their attributes. Furthermore, we extract the candidate concept instances and attributes. This paper proposes two ways to evaluate the accuracy of the candidate instances and attributes: the first measure is based on the correlation between concept instances and attributes, and the second one is based on the distribution similarity on the context patterns between the candidate instances (or attributes) and the seed instances (or attributes). Experiments on disease domain show that the precision of the top 500 and 1 000 results reaches 94% and 93%, respectively.
作者 康为 穗志方
出处 《中文信息学报》 CSCD 北大核心 2010年第1期54-59,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60873156) 国家社科基金资助项目(09BYY032)
关键词 计算机应用 中文信息处理 WEB 概念实例提取 属性提取 弱指导 上下文模式 computer application Chinese information processing web domain concept instance extraction attributes extraction weakly-supervised contextual pattern
  • 相关文献


  • 1M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora[C]//Proceedings of the 14th International Conference on Computational Linguistics [C]. Nantes, France, 1992:539-545.
  • 2M. Poesio, A. Almuhareb. Identifying Concept Attributes Using a Classifier[C]//Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition. Ann Arbor, 2005:18-27.
  • 3O. Etzioni, M. Cafarella, D. Downey,等. Unsupervised Named-Entity Extraction from the Web: An Ex perimental Study [J]. Artificial Intelligence, June 2005, 165:91-134.
  • 4M.J. Cafarella, D. Downey, S. Soderland, O. Etzioni. KnowItNow: Fast, Scalable Information Extraction from the Web[C]//Proceedings of HLT/EMNLP. Vancouver, October 2005:563-570.
  • 5N. Yoshinaga, K. Torisawa. Open-Domain Attribute Value Acquisition from Semi-Structured Texts[C]// Proceedings of the OntoLex 2007. Busan, South-Korea, November llth, 2007.
  • 6S. Ravi, M. Pasca. Using Structured Text for Large- Scale Attribute Extraction [C]//Proeeedings of the 17th International Conference on Information and Knowledge Management (CIKM-08). Napa Valley, California, USA, October 2008: 1183-1192.
  • 7G. Cui, Q. Lu, W. Li, Y. Chen. Automatic Acquisition of Attributes for Ontology Construction[C]//ICCPOL2009, Springer, 2009:248-259.
  • 8M. Pasca, B.V. Durme. Weakly-Supervised Acquisition of Open Domain Classes and Class Attributes from Web Documents and Query Logs[C]//Proceedings of the ACL-08.. HLT. Columbus, Ohio, USA, June 2008: 19-27.
  • 9F. Keller, M. Lapata, O. Ourioupina. Using The Web to Overcome Data Sparseness[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Philadelphia, July 2002: 230-237.
  • 10P. Turney. Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL[C]//Proceedings of the 12th ECML-2001 ). Freiburg, Germany, September, 2001: 491-502.











使用帮助 返回顶部