摘要
以矩阵理论作为研究的切入点,将经典向量空间模型中常用的向量和集合以矩阵的形式加以重构,并认为基于向量内积法的相似性计算与相应矩阵的乘法运算等价。结合稀疏矩阵和数据稀疏的定义,分析VSM信息检索背景下数据稀疏产生的原因;同时,讨论三种情形下数据稀疏对相似性计算的共同影响———部分毫无意义的时间复杂度。最后,给出规避数据稀疏问题的三层策略:文本级策略、文本集级策略和矩阵级策略。
With matrix theory as a research starting point, this paper reconstructs the vector and the set involved in the vector space model in the form of matrix, and indicates that the similarity calculation based on the method of inner product of vectors is equivalent to the corresponding matrix multiplication. Combined with the definitions of sparse matrix and data sparseness, it analyzes the causes of data sparseness under the background of VSM information retrieval. At the same time, it discusses that the data sparseness brings common consequences - part of the meaningless time complexity to similarity calculation under three circumstances. Finally, this paper gives three layers strategies: text level strategy, text set level strategy and matrix level strategy which can avoid the data sparseness.
出处
《图书情报工作》
CSSCI
北大核心
2013年第1期142-146,共5页
Library and Information Service
关键词
向量空间模型
信息检索
数据稀疏
规避策略
vector space model information retrieval data sparseness avoidance strategy