摘要
针对现存的单纯借助同义词词林或知识词典扩展关键词方法中存在噪音数据和计算量大的问题,提出了先扩展后精简的方法,即先利用同义词词林进行同义扩展,再利用知网义原树计算扩展词之间的语义距离,依据语义距离剔除相似度较小的噪音数据,实现关键词集合的精简。实验表明,当词语相似度阈值取0.8时,精简比例高达46.9%,精简后的关键词集合有效剔除了噪音数据,兼顾了信息检索的召回率和准确率,表现出良好的综合性能。
In order to solve the problem that existing method,which employs only the tongyici cilin or knowledge dictionary,has noise data and vast calculations,the method to extend Keywords first and reduce them afterwards is put forward in this paper.The method expands synonyms using the tongyici cilin firstly and then calculates their semantic distance of extended synonyms by means of the HowNet sememe tree.This method can realize the reduction of keywords set by eliminating the noise data with low similarity according to the semantic distance.When the threshold value is 0.8,the proportion of reduction attains 46.9% and the reduced keywords set gets rid of noise data effectively and takes both recall and accuracy rate into account.Experiments results show that this method realizes favorable performance.
出处
《计算机工程与应用》
CSCD
北大核心
2011年第23期13-16,24,共5页
Computer Engineering and Applications
基金
国家自然科学基金No.60970059
山西省国际科技合作计划项目(No.2009081022)~~
关键词
汉语问答系统
关键词扩展
义原树
关键词集合精简
Chinese question-answer system
keywords expansion
sememe tree
reduction of keywords set