期刊文献+

基于SVM的离合词词义消歧 被引量:4

Liheci Word Sense Disambiguation Based on SVM
下载PDF
导出
摘要 离合词词义消歧要解决如何让计算机理解离合词中的歧义词在具体上下文中的含义。针对离合词中歧义词在机器翻译中造成的对照翻译不准确以及在信息检索中无法匹配有效信息等问题,将词义消歧的方法应用于离合词中的歧义词,采用SVM模型建立分类器。为了提高离合词词义消歧的正确率,在提取特征时,结合离合词的特点,不仅提取了歧义词上下文中的局部词、局部词性、局部词及词性3类特征,还提取了"离"形式的歧义词的中间插入部分的特征;将文本特征转换为特征向量时,对布尔权重法进行了改进,依次固定某种类型特征权重,分别改变另外两种类型特征权重的消歧正确率来验证3类特征的消歧效果。实验结果表明,局部词特征、局部词及词性特征对消歧效果的影响高于局部词性特征,且采用不同类型的特征权重与采用相同的权重相比,消歧正确率提高了1.03%~5.69%。 The task of Liheci word sense disambiguation is to make computers choose the correct sense of a Liheci ambiguous word in a given context.For the problem that a Liheci ambiguous word in machine translation is not accurate and in the information retrieval is unable to match the useful information,a word sense disambiguation method was applied to the Liheci ambiguous words and a classifier model was established using SVM.In order to improve the accuracy of the Liheci word sense disambiguation,it extracts not only local word,local part of speech,local word and part of speech,but also the middle insert part of the separated form as disambiguation features according to the characteristics of Liheci.When the text characteristics was converted to feature vector,we could fixed feature weights of some type in turn and changed the feature weights of the other two types to verity the disambiguation effect of the three kinds of feature,respectively.The results show that the effect of local word feature,local word and part of speech features on disambiguation is higher than local part of speech,and using different types of feature weight compared with the same,disambiguation accuracy increases by 1.03%~5.69%.
出处 《计算机科学》 CSCD 北大核心 2016年第2期239-244,共6页 Computer Science
基金 国家自然科学基金项目(61375075) 河北省自然科学基金项目(F2013201134 F2012201020) 保定市科学技术研究与发展指导计划项目(15ZR063)资助
关键词 离合词 SVM 词义消歧 分类器 Liheci SVM Word sense disambiguation Classifier
  • 相关文献

参考文献11

二级参考文献106

共引文献68

同被引文献38

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部