期刊文献+

基于预聚类的潜在语义分析模型文献检索研究 被引量:1

A new pre-clustering-based latent semantic analysis algorithm for document retrieval
下载PDF
导出
摘要 提出一种基于预聚类的潜在语义文献检索算法.首先,对待检索文档集进行预聚类,在潜在语义分析方法的基础上采用k-means聚类算法,寻找出各聚类簇的中心点;其次,在检索时,通过计算查询向量与各聚类簇中心点的相似度来进行检索.此方法有效解决了现有潜在语义文献检索算法在检索时需耗费大量时间计算查询向量与各文本向量之间的相似度的不足.另外还针对文献检索的特点,重新给出特征权重计算方法.实验结果表明,该方法缩短了检索的时间,提高了检索的效率. This paper proposes a pre - clustering - based latent semantic analysis algorithm for document retrieval. It first clusters the documents using k - means clustering based on the latent semantic analysis, finds out the central point of each cluster, and then calculates the similarity between the query vector and each cluster's central points for retrieval. The algorithm can solve the problem of time - consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. In view of the characteristics of document retrieval, it proposes a new method for calculating the feature weights. The results of the experiment show that the new algorithm can reduce the search time, and improve the retrieval efficiency.
出处 《云南民族大学学报(自然科学版)》 CAS 2015年第3期257-260,共4页 Journal of Yunnan Minzu University:Natural Sciences Edition
基金 国家民委科研项目(12YNZ008) 云南省教育厅科学研究基金(2012Y315) 云南民族大学青年基金(11QN08)
关键词 潜在语义分析 文献检索 奇异值分解 latent semantic analysis document retrieval singular value decomposition k - means
  • 相关文献

参考文献10

二级参考文献74

  • 1陈苒,董占球.WWW信息搜索技术研究[J].计算机工程与应用,2001,37(14):62-64. 被引量:2
  • 2吴丹.本体在信息检索中的作用及实例研究[J].情报杂志,2006,25(6):72-75. 被引量:12
  • 3居斌.潜在语义标引在中文信息检索中的研究与实现[J].计算机工程,2007,33(5):193-196. 被引量:16
  • 4ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH : An Efficient Data Clustering Method for very Large Database [ C ]//Proc of the ACM SIGMOD Int's Conf on Management of Data. Montreal Canada: ACM Press, 1996:83 -94.
  • 5SANDER F, ESTER M, KRIEGEL H P. The Mgorithm GDBSCAN and its Applications [ J ]. Data Mining and Knowledge Dis- covery, 1998(2) :178 - 192.
  • 6Mooers C.Application of random codes to the gathering of statistical information.M.S.Thesis.Massachusetts Institute of Technology,1948.
  • 7Raeza-Yates R,Ribeiro-Nero B.Modern information retrieval.Massachusetts:Addison Wesley,1999.
  • 8Wong S K M,Ziarko W,Wong P C N.Generalized vector space model in information retrieval,In:Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'85).Montreal,Canada,1985:18-25.
  • 9Wilkinson R,Hingston P.Using the cosine measure in a neural network for document retrieval.In:Proceedings of 14th Annual International ACM SIGIR Cenference on Research and Development in Information Retrieval (SIGIR'91).Chicago,USA,1991:202-210.
  • 10Turtle H,Croft W B.Evaluation of an inference network-based retrieval model.ACM Transactions on Information Systems,1991,9(3):187-222.

共引文献82

同被引文献27

  • 1Han J,Kamber M,Pei J.数据挖掘:概念与技术[M].第3版.范明,孟小峰译.北京:机械工业出版社,2012.
  • 2Magerman T, Van Looy B, Song X. Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications [J]. Scientometrics, 2010, 82(2): 289-306.
  • 3Wang W, Yu B. Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis [J]. Neural Computing & Application, 2009, 18(8): 875-881.
  • 4Olmos R, Le6n J A, Jorge-Botana G, et al. New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis [J]. Behavior Research Methods, 2009, 41(3): 944-950.
  • 5Law J, Bauin S, Courtial J P, et al. Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification [J]. Scientometrics, 1988, 14(3):251-264.
  • 6任建华,沈炎彬,孟祥福,等.基于词条之间关联关系的文档聚类[J/OL].[2014-12-11].计算机工程与应用.http://WWW.cnki.net/kcms/detail/11,2127.TP,20141211,1528.053.html.
  • 7Steyvers M, Griffith T. Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning [M]. Laurence Erlbaum, 2006.
  • 8Landauer T K, Foltz P W, Laham D. An Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25(2-3): 259-284.
  • 9Leydesdorff L. Similarity Measures, Author Cocitation Analysis, and Information Theory [J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
  • 10Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. http://structureddynamics.com/linked_data.html.

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部