期刊文献+

一种基于搭配的中文词汇语义相似度计算方法 被引量:13

A Collocation-based Method for Semantic Similarity Measure for Chinese Words
下载PDF
导出
摘要 词汇间的语义相似度计算在自然语言处理相关的许多应用中有基础作用。该文提出了一种新的计算方法,具有高效实用、准确率较高的特点。该方法从传统的分布相似度假设"相似的词汇出现在相似的上下文中"出发,提出不再采用词汇在句子中的邻接词,而是采用词汇在二词名词短语中的搭配词作为其上下文,将更能体现词汇的语义特征,可取得更好的计算结果。在自动构建大规模二词名词短语的基础上,首先基于tf-idf构造直接和间接搭配词向量,然后通过计算搭配词向量间的余弦距离得到词汇间的语义相似度。为了便于与相关方法比较,构建了基于人工评分的中文词汇语义相似度基准测试集,在该测试集中的名、动、形容词中,方法分别得到了0.703、0.509、0.700的相关系数,及100%的覆盖率。 The word similarity measure plays a basic role in many NLP related applications. In this paper, we propose a novel and practical method for this purpose with acceptable precision. Guided by the classic distribution hypothesis that "similar words occur in similar contexts", we suggest the collocations in two-word noun phrases can serve as better contexts than the adjacent words because the former are more semantic related. By using automatic built large-scale noun phrases, we firstly construct tf-idf weighted words vectors containing direct and indirect collocations, and then take their cosine distances as desired semantic similarities. In order to compare with related approa ches, we manually design a benchmark test set. On the benchmark test set, the proposed method achieves the correlation coefficients of 0.703, O. 509, and 0.700 on nouns, verbs, and adjectives, respectively, at a coverage 100%.
出处 《中文信息学报》 CSCD 北大核心 2013年第1期7-14,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60573063 60573064 60773059 61035004) 国家863计划资助项目(2007AA01Z325) 国家社科基金重点资助项目(10AYY003)
关键词 语义相似度 词汇搭配 相似度基准测试集 semantic similarity, word collocation, similarity benchmark set
  • 相关文献

参考文献2

二级参考文献42

  • 1董振东,董强,郝长伶.知网的理论发现[J].中文信息学报,2007,21(4):3-9. 被引量:97
  • 2詹卫东.面向中文信息处理的现代汉语短语结构规划研究[D].北京:北京大学博士论文,2000.
  • 3Thomas M.Cover,Joy A.Thomas.Elements of Information Theory[M].John Wiley &Sons.Inc.July 2006.
  • 4George A.Miller.WordNet:A Lexical Database for English[J].Communications of the ACM(CACM),1995,38:39-41.
  • 5Piek Vossen.Eurowordnet:a multilingual database with lexical semantic networks[M].Dordrecht:Kluwer Academic Publishers,1998.
  • 6Altangere Chagnaal,Ho-Seop Choe,Cheol-YoungOck and Hwa-Mook Yoon.On the Evaluation of Korean wordNet[C]//TSD 2007:123-130.
  • 7Chen H.H,Lin,C.C.,and Lin,W.C.Construetion of a Chinese-English WordNet and its application to CLIR[C]//Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages,2000.
  • 8Hsin-Hsi Chen,Chi-Ching Lin,Wen-Cheng Lin.Building a Chinese-English WordNet for Translingual applications[J].ACM Tram.Asian Lang.Inf.Process,2002.
  • 9Christopher D.Manning,and Hinrich Schfitze.Foundations of Statistical Natural Language P.rocessing[M].MIT Press,1999.
  • 10Gaolin Fang,Hao Yu,Fumihito Nishino.Chinese-English Term Translation Mining Based on Semantic Prediction[C]//ACL 2006.

共引文献77

同被引文献139

引证文献13

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部