摘要
论文通过分析传统向量空间模型(VSM)的信息检索模式和讨论基于特征项-文本矩阵的特征项赋权因子(TF-IDE)的赋值问题,提出以潜在语义索引/奇异值分解(LSI/SVD)方法为基础,采用文本相似度描述特征项语义间的联系,运用截断法来降低特征项-文本矩阵原始向量空间维数,解决特征项之间存在语义缺乏约束及向量空间维数过大的问题。仿真实验表明,该方法相对于传统向量空间模型更加高效实用。
In this paper, we analyze the traditional retrieval model which uses Vector Space Model (VSM) and discuss the weight of TFIDF (Term-Frequency Inverse-Document-Frequency) based on term-text matrix. Based on latent semantic indexing/ Singular Val- ue Decomposition (LSI/SVD) model, we use text similarity to describe relation semantic between terms and use truncation to reduce dimensionality about primitive term-text matrix. It solves the problems that VSM has no semantic restriction between terms and has a large dimensionality. We find LSI/SVD is better than VSM by simulation.
出处
《微计算机信息》
2009年第30期10-12,共3页
Control & Automation