期刊文献+

基于并列结构的部分整体关系获取方法 被引量:2

Extracting Part-Whole Relations Based on Coordinate Structure
下载PDF
导出
摘要 部分整体关系是一种基础而重要的语义关系,从文本中自动获取部分整体关系是知识工程的一项基础性研究课题。该文提出了一种基于图的从Web中获取部分整体关系的方法,首先利用部分整体关系模式从Google下载语料,然后用并列结构模式从中匹配出部分概念对,据此形成图,用层次聚类算法对该图进行自动聚类,使正确的部分概念聚集在一起。在层次聚类基础上,我们挖掘并列结构的特性、图的特点和汉语的语言特点,采用惩罚逗号边、去除低频边、奖励环路、加重相同后缀和前缀等5种方法调整图中边的权重,在不损失层次聚类的高准确率条件下,大幅提高了召回率。 Automatic discovery of part-whole relations from the Web is a fundamental but critical problem in knowl- edge engineering. This paper proposes a graph based method of extracting part-whole relations from the Web. Firstly, we download snippets from Google using part-whole query patterns, and then we built a graph by extracting word pairs with a coordinate structure from these snippets, with the co-occurring words as nodes and the frequency count as edges' weight. A hierarchical clustering method is used to cluster the correct parts, which is optimized by five methods of adjusting the edge weight: reduce the weight of comma-edges, cut the low-frequency edges, enlarge the weight of edges in the loop, enlarge the weight of edges in which two nodes share the same suffix, and enlarge the weight of edges in which two nodes share the same prefix. Experimental results show that the five methods in- crease the recall substantially.
出处 《中文信息学报》 CSCD 北大核心 2015年第1期88-96,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(91224006 61173063 61035004 61203284 309737163) 国家社科基金(10AYY003)
关键词 部分整体关系 图模型 并列结构 层次聚类 边权重 part whole relations graph model coordinate structure hierarchical clustering edge weight
  • 相关文献

参考文献15

  • 1George A Miller. WordNet: A Lexieal Database for English[J]. Communications of the ACM, 1995, 38: 39-41.
  • 2M A Hearst, Automatic Acquisition of hyponyms from large text corpora[C]//Proceedings of the 14th Inter- national Conference on Computational Linguistics (COLING-92), Nantes,France, 1992: 539-545.
  • 3M Berland, E Charniak. Finding Parts in Very Large Corpora[C]//Proceedings of the the 37th Annual Meeting of the Association for Computational Linguis- tics (ACL-99). 1999.
  • 4J Wu, B Luo, C G Cao,et al. Acquisition and Verifi- cation of Mereological Knowledge from Web Page Texts[J]. Journal of East China University of Science and Technology (Natural Science Edition), Shanghai, China, 2006: 1310-1317.
  • 5吴洁.网络文本中部分关系知识的获取与验证方法[D].上海:华东理工大学硕士学位论文.2006.
  • 6Xinyu C, Cungen C, Shi W, et al. Extracting Part- Whole Relations from Unstructured Chinese Corpus [C]//Proceedings 4th International Conference on Natural Computation (ICNC'08) and 5th International Conference on Fuzzy Systems and Knowledge Discov- ery (FSKD'08), Jinan, China. 2008.
  • 7R Girju, A Badulescu, D Moldovan, Automatic Dis- covery of Part-Whole Relations[J]. Computational Lin- guistics, 2006,32(1): 83-135.
  • 8R H Willem, H Kolb, G Schreiber. A method for learn- ing part-whole relations[C]//Proceedings of the 5th Int. Semantic Web Conf. , LNCS, 2006:723-736.
  • 9Ellen Riloff, Jessica Shepherd. A corpus based ap- proach for building semantic lexicons [C]//Proceed- ings of the Second Conference on empirical Methods in Natural Language Processing, 1997 : 117-124.
  • 10Brian Roark, Eugene Charniak. Noun-phrase cooc- curence statistics for semi-automatic semantic lexicon construction [C]//Proceedings of COLING-ACL, 1998:1110-1116.

二级参考文献10

  • 1Lin D. Automatic retrieval and clustering of similar words [C/OL] //Proc of COLING-ACL. 1998: 768-774. [2009-07- 10]. http://aclweb, org/anthology-new/.
  • 2Widdows D, Dorow D. A graph model for unsupervised lexical acquisition[C/OL] //Proc of COLING. 2002 :1-7.[2009-07-10]. http,//aclweb, org/anthology-new/.
  • 3Davidov D, Rappoport A. Efficient unsupervised discovery of word categories using symmetric patterns and frequency words[C/OL] //Proe of ACL. 2006: 297-304. [2009-07 10]. http://aclweb, org/anthology new/.
  • 4Kozareva Z, Riloff E, Hovy E. Semantic class learning from the Web with hyponym pattern linkage graphs [C/OL] //Proc of ACL. 2008: 1048-1056. [2009-07-10]. http:/Jaclweb. org/anthology-new/.
  • 5Mirkin S, Dagan I, Geffet M. Integrating pattern-based and distributional similarity methods for lexical entailment acquisition [C/OL]//Proc of COLING-ACL. 2006: 579-586. [2009 07 10]. http://aclweb, org/anthology-new/.
  • 6Kaji N, Kitsuregawa M. Using hidden Markov random fields to combine distributional and pattern based word clustering [C/OL] //Proe of COLING. 2008: 401-408. [2009-07-10]. http://aclweb, org/anthology new/.
  • 7Pantel P, Ravichandran D, Hovy E. Towards terascale knowledge acquisition [C/OL] //Proc of COLING. 2004: 771-777. [2009-07-10]. http://aclweb, org/anthology-new/.
  • 8刘群,李素建.基于《知网》的词汇语义相似度计算[C/OL].//第三届汉语词汇语义学研讨会.2002.[2009-07-10].http://www. keenage, com/html.
  • 9Newman M. Fast algorithm for detecting community structure in networks [J]. Physical Review E, 69. 066133.
  • 10Matsuo Y, Sakaki T, Uchiyama K, et al. Graph based clustering using a Web search engine [C/OL] //Proc of EMNLP. 2006: 542-550. [2009-07-10]. http://aclweb, org/ antholog-new/.

共引文献12

同被引文献31

引证文献2

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部