摘要
为了提高信息检索系统检索性能,针对信息检索系统中普遍使用的向量空间模型(VSM)所固有的缺陷,提出一种新的基于分类和扩展向量空间模型CE-VSM(Classifier Expand-Vector Space Model)。该模型对传统的空间向量法进行了改进,引入分词技术、朴素贝叶斯分类器和专业词库,重新定义了资源特征向量和查询索引项的内容,参考关键词出现的频率及其在所描述的资源中所起的作用等因素来计算特征索引项和资源向量的权重值。在此基础上,又对查询索引项使用了基于专业词库的扩展策略。实验证明该模型使检索能够在相对精确的范围内进行,提高检索查准率和查全率,改善了信息检索系统的性能。
In order to improve the retrieval performance of information retrieval system,a new vector space model CE-VSM(Classifier Expand—Vector Space Vector Space Model) is put forward based on classification and extension,which according to the deficiency of normal vector space model(VSM) used in information retrieval system.The model modifies traditional space vector method,introduces participle technology,naive Bayes classifier and speciality lexicon,redefines the content of resource eigenvector and query index entry,calculates the weight of characteristic index entry and resource vector according to the frequency of keyword and its influence.Furthermore,expansion strategy based on professional lexicon is also uesd in query index entry.Experimental evidence shows that the model makes the retrieve running at relative accurate environment,improves precision ratio and recall ratio during retrieval and modifies the performance of information retieval system.
出处
《科学技术与工程》
2010年第33期8164-8167,共4页
Science Technology and Engineering