期刊文献+

一种新的英文文本检索算法 被引量:1

New retrieval algorithm for English texts
下载PDF
导出
摘要 提出一种新的英文文本检索算法,该算法将英文文本映射为26阶频率矩阵,然后通过奇异值分解,对文本表示空间进行降维处理,并融合第一奇异值分量和第二奇异值分量的特征,得到既反映字母统计频率,又反映文本字符间顺序结构的复特征向量,最后利用向量间余弦相似度作为文本检索的相似度度量。数据对比表明,算法取得了较好的实验效果,且在检索准确率和运算效率上优于经典的LSA算法。 In this paper,a new retrieval algorithm for English texts is proposed.First of all,the English texts are mapped into frequency matrixes of order 26 and the dimensions of texts representation space are reduced through singular value decomposition.Second,it fuses the features of the first singular value component and the second one,and then gets the complex feature vectors which reflect not only the statistic frequency but also the sequential structure of letters.In the end,the cosine similarity of texts is used to measure the similarity between the query and documents.The data comparison indicates that this algorithm has well experimental results.Moreover,it gets the advantage over the classic LSA retrieval algorithm in precision and operational efficiency.
作者 高仕龙
出处 《计算机工程与应用》 CSCD 北大核心 2010年第5期21-23,58,共4页 Computer Engineering and Applications
基金 国家自然科学基金No.10571127 四川省教育厅科研项目(No.09ZB026)~~
关键词 文本检索 特征融合 频率矩阵 奇异值分解 texts retrieval feature fusion frequency matrix Singular Value Decomposition(SVD)
  • 相关文献

参考文献10

  • 1丁国栋,白硕,王斌.文本检索的统计语言建模方法综述[J].计算机研究与发展,2006,43(5):769-776. 被引量:19
  • 2Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Survey,2002,34(1):1-47.
  • 3樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 4Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society of Information Science,1990,41(6):391-407.
  • 5Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
  • 6Greiff W R.A theory of term weighting based on exploratory data analysis[C] //Proceedings of SIGIR-98,Melbourn,Australia,1998.
  • 7Jones K S.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation,1972,28:11-21.
  • 8Kalt T.A Hew probabilistic model of text classification and retrieval,IR-78[R].University of Massachusetts Center for Intelligent Information Retrieval,1996.
  • 9Lewis D D.Naive(Bayes)at forty:The independence assumption in information retrieval[C] //EMCL,1998:4-15.
  • 10Landauer T K.A solution to Plato's problem:The latent semantic analysis theory of the acquisition,induction,and representation of knowledge[J].Psychological Review,1997,104:211-240.

二级参考文献46

  • 1Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50.
  • 2Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 3Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15.
  • 4Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989.
  • 5Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996.
  • 6Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142.
  • 7Yang Y. , Liu X.. A Re-examination of text categorization methods. In: Proceedings of SIGIR'99, the 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, 42-49.
  • 8樊兴华.因果推理和文本分类.清华大学博士后出站报告,2004.
  • 9Larkey L. S.. Automatic essay grading using text categorization techniques.. In: Proceedings of SIGIR'98, the 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998, 90-95.
  • 10Dumais S. T. , Platt J. , Hecherman D. , Sahami M.. Inductive learning algorithms and representation for text categorization.In: Proceedings of CIKM'98, the 7th ACM International Conference on Information and Knowledge Management, Bethesda, MD, 1998, 148-155.

共引文献87

同被引文献6

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部