摘要
基于对语料的统计分析,提出了词关联度的概念。通过对文本库中词语出现的频率,以及任意两个词语共同出现的频率进行统计,获得了各个词语之间的关联度,并使用这一参数对语义向量进行调整,可以有效地解决传统向量空间模型的单词依赖问题。结合倒排索引技术,实际建立了一个相当规模的文本检索系统。测试结果表明,系统具有较好的效果和良好的性能,具备实用价值。
This paper introduces the concept of word relation, which reflects the statistical property of a text collection. Word relations are defined by the number of documents containing certain word and word pairs. It is used in adjusting semantic vector to solve the word dependency problem in traditional vector space model. This paper has implemented a text search system based on word relation, also integrated with inverted index. Several design issues are discussed in detail. It shows both good precision and sat...
出处
《微型电脑应用》
2011年第3期62-64,6,共4页
Microcomputer Applications