期刊文献+

基于Web知识的无监督英文目录标签消歧 被引量:1

UNSUPERVISED WEB-BASED DISAMBIGUATION ON ENGLISH DIRECTORY LABELS
下载PDF
导出
摘要 词义消歧,作为自然语言处理领域最具挑战性的任务之一,目前正面临着知识获取瓶颈(Knowledge Acquisition Bottle-neck)的阻碍。目录标签消歧,作为词义消歧的又一崭新的应用领域,是轻量级本体学习(Lightweight Ontology Learning)中十分重要的一个环节。旨在探索一种基于Web知识(不受知识获取瓶颈制约)并应用于目录标签消歧的词义消歧方法。其主要思想为:首先,利用Web知识(Web搜索引擎)和WordNet等外部资源,将待消歧词t的上下文c及n个候选词义s1…sn扩展为各自的向量形式,并提出的一种tf-idf变体(条件tf-idf)来计算向量中的分量值。之后,又提出一种新颖的混合消歧模型,综合考虑各候选词义与待消歧词上下文的相关度及候选词义先验分布这两个因素进行消歧。据了解,类似做法在基于Web的词义消歧中还未出现过。在实验中,在网页目录DMOZ的一个子集(共1100个待消歧词)上进行了实验。系统以100%的召回率达到83.40%的准确率,高于基线准确率(单纯根据词义先验分布消歧)73.37%达10个百分点。 Word sense disambiguation ( WSD), as the most challenging task in natural language processing sector, is facing the impediment from knowledge acquisition bottleneck. Directory label disambiguation, as a brand new application of WSD, plays an essential role in light- weight ontology learning. This article aims at exploring a way to disambiguate word sense using Web knowledge ( not limited by the knowledge acquisition bottleneck) and applying this in directory labels' disambiguation. In the approach we proposed that,firstly the context c and n candidate word senses s~...sn of the target word (to be disambiguated) are expanded to vectors using external resources such as Web knowl- edge ( Web search engine) and WordNet. The components of the vectors are calculated by ~ variant of tf-idf (conditional tf-idf) proposed in this paper. Furthermore, a novel model of mixture disambiguation for WSD is proposed, in which both the similarity between context of the word to be disambiguated and candidate word senses and the priori probability of candidate word senses are comprehensively considered to perform the disambiguation. To the author' s knowledge, there is no similar approach in Web-based WSD before. In the experiment, we performed WSD task on a subset of DMOZ Web directory ( 1100 target words to be disambiguated in total). We achieved a precision of 83.40% with 100% recall ,which is 10 percents higher than the baseline precision (disambiguation purely based on priori probabilities of word senses) 73.37%.
作者 孙磊
机构地区 复旦大学
出处 《计算机应用与软件》 CSCD 2010年第9期224-227,282,共5页 Computer Applications and Software
关键词 词义消岐 基于Web知识 无监督 轻量级本体 Word sense disambiguation Web based knowledge Unsupervised Lightweight ontology
  • 相关文献

参考文献9

  • 1Ilya Zaihrayeu,Lei Sun,Fausto Giunchiglia,et al.From web directories to ontologies:Natural language processing challenges[C] //6th International Semantic Web Conference (ISWC).Springer,2007.
  • 2Eneko Agirre,Olatz Ansa,David Martinez.Enriching wordnet concepts with topic signatures[C] //Proceedings of the NAACL workshop on WordNet and Other Lexical Resources:Applications,Extensions and Customizations,2001.
  • 3Ioannis P Klapaftis,Suresh Manandhar.Google & wordnet based word sense disambiguation[C] //Proceedings of the first workshop on learning and extending ontologies by using machine learning methods,International conference on Machine Learning,ICML-05,Bonn,Germany,2005.
  • 4Paolo Rosso,Manuel Montes y Gómez,Davide Buscaldi,et al.Two web-based approaches for noun sense disambiguation[C] //CICLing 8:267-279.
  • 5Giunchiglia F,Marchese M,Zaihrayeu I.Encoding Classifications into Lightweight Ontologies[C] //Lecture notes in computer science,2006,4011:80.
  • 6Benjamin Snyder,Martha Palmer.The english all-words task[C] //Senseval-3:Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text,Barcelona,Spain,July 2004:41-43.
  • 7Fellbaum C.WordNet:An Electronic Lexical Database.MIT Press,Cambridge,MA,1998.
  • 8Google RESTful Data APIs Overview[OL].http://code.google.com/apis/opensocial/docs/dataapis.html.
  • 9Karen Sparck Jones.A statistical interpretation of term specificity and its application in retrieval.1988:132-142.

同被引文献9

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部