摘要
在信息检索过程中,由于文档中存在大量的多义和近义现象,导致不确定性出现,这将影响检索的性能。为此我们采用信息熵和粗糙集理论来处理这类不确定性问题。首先计算训练文档集中的词之间的信息熵,对信息熵做模糊聚类来构造词之间的等价关系,然后借助于该等价关系提出并实现了一个以粗糙集上下近似为基础的信息检索模型,通过实验的测试,该模型能够提高信息检索的效率。
In the processing of information retrieval, the existence of polysemy and synonymy would lead to uncertainty, which reduce the effective of information retrieval. In this paper, a model based on information entropy is proposed, in which the uncertainty is captured by rough sets. At first, we count the information entropy between the words of the training corpus, and then the mutual information is employed to build an equivalent relation through fuzzy clustering. We propose and implement an information retrieval model based on upper and lower approximations of rough sets, which resort to equivalent relation. Experiments show that the model can get improvement of information retrieval.
出处
《模糊系统与数学》
CSCD
北大核心
2010年第3期149-153,共5页
Fuzzy Systems and Mathematics
基金
江西省科技支撑项目(200720015)
关键词
信息熵
模糊聚类
粗糙集
信息检索
Information Entropy
Fuzzy Clustering
Rough Sets
Information Retrieval