期刊文献+

基于朴素贝叶斯模型的单词语义相似度度量 被引量:12

Word Semantic Similarity Measurement Based on Naïve Bayes Model
下载PDF
导出
摘要 单词语义相似度度量是自然语言处理领域的经典和热点问题.通过结合朴素贝叶斯模型和知识库,提出一个新颖的度量单词语义相似度度量途径.首先借助通用本体WordNet获取属性变量,然后使用统计和分段线性插值生成条件概率分布列,继而通过贝叶斯推理实现信息融合获得后验概率,并在此基础上量化单词语义相似度.主要贡献是定义了单词对距离和深度,并将朴素贝叶斯模型用于单词语义相似度度量.在基准数据集R&G(65)上,对比算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析,样本Pearson相关度达到0.912,比当前最优方法高出0.4%,比经典算法高出7%~13%;Spearman相关度达到0.873,比经典算法高出10%~20%;且算法的运行效率和经典算法相当.实验结果显示将朴素贝叶斯模型和知识库相结合解决单词语义相似度问题是合理有效的. Measuring semantic similarity between words is a classical and hot problem in nature language processing,the achievement of which has great impact on many applications such as word sense disambiguation,machine translation,ontology mapping,computational linguistics,etc.A novel approach is proposed to measure words semantic similarity by combining Nave Bayes model with knowledge base.To start,extract attribute variables based on WordNet;then,generate conditional probability distribution by statistics and piecewise linear interpolation technique;after that,obtain posteriori through Bayesian inference;at last,quantify word semantic similarity.The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words'sense,and word semantic similarity measurement based on nave Bayesian model.On benchmark data set RG(65),the experiment is conducted through 5-fold cross validation.The sample Pearson correlation between test results and human judgments is 0.912,with 0.4%improvement over existing best practice,and7%~13%improvement over classical methods.Spearman correlation between test results and human judgments is 0.873,with 10% ~20% improvement over classical methods.And the computational complexity of the method is as efficient as the classical methods,which indicates that integrating Nave Bayes model with knowledge base to measure word semantic similarity is reasonable and effective.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第7期1499-1509,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60973040) 国家自然科学青年基金项目(60903098,61300148) 吉林省重点科技攻关项目(20130206051GX) 吉林省科技计划青年基金项目(20130522112JH)
关键词 单词语义相似度 语义相似度 分段线性插值 朴素贝叶斯模型 WORDNET word semantic similarity semantic similarity piecewise linear interpolation Naïve Bayes model WordNet
  • 相关文献

参考文献30

  • 1李茹,王智强,李双红,梁吉业,Collin Baker.基于框架语义分析的汉语句子相似度计算[J].计算机研究与发展,2013,50(8):1728-1736. 被引量:47
  • 2Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification [G]// WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 1998:265-283.
  • 3Zhou M, Ding Y, Huang C. Improving translation selection with a new translation model trained by independent monolingual corpora [J]. Computational Linguistics and Chinese Language Processing, 2001, 6(1): 1-26.
  • 4Hassan H, Hassan A, Emam O. Unsupervised information extraction approach using graph mutual reinforcement [C] // Proc of the 2006 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2006: 501- 508.
  • 5Rada R, Mill H, Bicknell E, et al. Development and application of a metric on semantic nets [J]. IEEE Trans on Systems, Man, and Cybernetics, 1989, 19(1): 17-30.
  • 6Resnik P. Using information content to evaluate semantic similarity in a taxonomy [C] //Proc of Int Joint Conf for Artificial Intelligence. San Francisco: Morgan Kaufmann, 1995:448-453.
  • 7Wu Z, Palmer M. Verbs semantics and lexieal selection [C] //Proc of the 32nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 1994: 133-138.
  • 8Agirre E, Rigau G. A proposal for word sense disambiguation using conceptual distance [C] //Proc of the 1st Int Conf on Recent Advances in Natural Language Processing. Cambridge: MIT Press, 1995:35-43.
  • 9Jiang J, Conrath D. Semantic similarity based on corpus statistics and lexical taxonomy [C] //Proe of the 10th In: Conf on Research in Computational Linguistics. Stroudsburg, PA: ACL, 1997:1-15.
  • 10Lin D. An information-theoretic definition of similarity [C]// Proc of the 15th Int Conf on Machine Learning. New York: ACM, 1998; 296-304.

二级参考文献23

  • 1张奇,黄萱菁,吴立德.一种新的句子相似度度量及其在文本自动摘要中的应用[J].中文信息学报,2005,19(2):93-99. 被引量:34
  • 2赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量:28
  • 3杨思春.一种改进的句子相似度计算模型[J].电子科技大学学报,2006,35(6):956-959. 被引量:34
  • 4郝晓燕,刘伟,李茹,刘开瑛.汉语框架语义知识库及软件描述体系[J].中文信息学报,2007,21(5):96-100. 被引量:51
  • 5Lee M C. A novel sentence similarity measure for semantic-based expert systems [J]. Expert Systems with Applications, 2011, 38(5): 6392-6399.
  • 6穗志方,俞士汶.基于骨架依存树的语句相似度模型[C]//中文信息处理国际会议录.北京:清华大学出版社,1998:458-465.
  • 7Aliguliyev R M. A new sentence similarity measure and sentence based extractive technique for automatic text summarization [J]. Expert Systems with Applications, 2009,36(4): 7764-7772.
  • 8车万翔,刘挺,秦兵,等.面向双语句对检索的汉语句子相似度[C]//全国第七届计算语言学联合学术会议录.北京:清华大学出版社,2003.
  • 9董振东,董强.“知网”.1999[2011-08-20].http://www.keenage.com.
  • 10Miller G A, Beckwith R, Fellbaum C D, et al. WordNet: An online lexical database [J]. Int Journal of Lexicography, 1990, 3(4): 235-244.

共引文献46

同被引文献82

引证文献12

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部