摘要
在信息检索过程中,由于文档中存在大量的多义和近义现象,导致不确定性出现,这将影响检索的性能.为此采用基于互信息的粗糙集理论来处理这类不确定性问题.首先计算训练文档集中的词之间的互信息,对互信息做模糊聚类来构造词之间的等价关系,然后借助于该等价关系提出并实现了一个以粗糙集上下近似为基础的信息检索模型,通过实验的测试,该模型能够提高信息检索的效率.
In the processing of information retrieval, the existence of polysemy and synonymy would lead to uncertainty, which reduces the effectiveness of information retrieval. A model based on mutual information is proposed, in which the uncertainty is captured by rough sets. At first, the mutual information between the words of the training corpus is counted, and then the mutual information is employed to build an equivalent relation through fuzzy clustering. An information retrieval model based on upper and lower approximations of rough sets is proposed and implemented in the light of equivalent relation. Experiments show that the model can get improvement of information retrieval.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2006年第3期17-19,138,共4页
Journal of Shandong University(Natural Science)
关键词
互信息
模糊聚类
粗糙集
信息检索
mutual information
fuzzy clustering
Rough sets
information retrieval