期刊文献+

VSM信息检索中的数据稀疏问题分析与规避策略 被引量:3

Data Sparseness Analysis and its Avoidance Strategies in the VSM Information Retrieval
原文传递
导出
摘要 以矩阵理论作为研究的切入点,将经典向量空间模型中常用的向量和集合以矩阵的形式加以重构,并认为基于向量内积法的相似性计算与相应矩阵的乘法运算等价。结合稀疏矩阵和数据稀疏的定义,分析VSM信息检索背景下数据稀疏产生的原因;同时,讨论三种情形下数据稀疏对相似性计算的共同影响———部分毫无意义的时间复杂度。最后,给出规避数据稀疏问题的三层策略:文本级策略、文本集级策略和矩阵级策略。 With matrix theory as a research starting point, this paper reconstructs the vector and the set involved in the vector space model in the form of matrix, and indicates that the similarity calculation based on the method of inner product of vectors is equivalent to the corresponding matrix multiplication. Combined with the definitions of sparse matrix and data sparseness, it analyzes the causes of data sparseness under the background of VSM information retrieval. At the same time, it discusses that the data sparseness brings common consequences - part of the meaningless time complexity to similarity calculation under three circumstances. Finally, this paper gives three layers strategies: text level strategy, text set level strategy and matrix level strategy which can avoid the data sparseness.
作者 梁士金
出处 《图书情报工作》 CSSCI 北大核心 2013年第1期142-146,共5页 Library and Information Service
关键词 向量空间模型 信息检索 数据稀疏 规避策略 vector space model information retrieval data sparseness avoidance strategy
  • 相关文献

参考文献26

  • 1Salton G, Yang C S. On the specification of tel'In values in automatic indexing[J]. Journal of Documentation,1973,29(4) :351 - 372.
  • 2Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18 ( 11 ) : 613 - 620.
  • 3邹涛,王继成,杨文清,张福炎.文本信息检索技术[J].计算机科学,1999,26(9):72-75. 被引量:31
  • 4Tai Xiaoying, Sasaki M, Tanaka Y, et al. Improvement of vector space information retrieval model based on supervised lemaaing [ C ]//Proceedings of the 5th International Workshop Information Retrieval with Asian Languages. New York : ACM,2000:69 - 74.
  • 5Isbell C L, Viola P. Restructuring sparse high dimensional data for effective retrieval[ C ]//Advances in Neural Information Processing Systems 11. San Mateo : Kaufmann, 1999:480 - 486.
  • 6Frakes W B, Baeza-Yates R. Information retrieval:Data structures and algorithms [ M ]. Englewood : Prentice-Hall, 1992 : 420 - 441.
  • 7刘志为,何丕廉,孙越恒,郑小慎.N层向量空间模型在Web信息检索中的应用[J].微型机与应用,2004,23(12):60-62. 被引量:5
  • 8刘海峰,王元元,王倩.基于分类的VSM模式下文本检索研究[J].情报科学,2006,24(11):1700-1703. 被引量:11
  • 9Sun Yueheng, lie Pilian, Chen Zhigang. An improved team weighting scheme for vector space model [ C ]//Proceedings of the Third International Conference on Machine Learning and Cybernetics. Piscataway : IEEE ,2004 : 1692 - 1695.
  • 10Kang B Y,Lee S J. Document indexing: A concept-based approachto term weight estimation [ J ]. Information Processing and Management,2005,41 (5) : 1065 - 1080.

二级参考文献126

共引文献107

同被引文献83

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部