基于预聚类的潜在语义分析模型文献检索研究被引量：1

A new pre-clustering-based latent semantic analysis algorithm for document retrieval

下载PDF

导出

摘要提出一种基于预聚类的潜在语义文献检索算法.首先,对待检索文档集进行预聚类,在潜在语义分析方法的基础上采用k-means聚类算法,寻找出各聚类簇的中心点;其次,在检索时,通过计算查询向量与各聚类簇中心点的相似度来进行检索.此方法有效解决了现有潜在语义文献检索算法在检索时需耗费大量时间计算查询向量与各文本向量之间的相似度的不足.另外还针对文献检索的特点,重新给出特征权重计算方法.实验结果表明,该方法缩短了检索的时间,提高了检索的效率. This paper proposes a pre - clustering - based latent semantic analysis algorithm for document retrieval. It first clusters the documents using k - means clustering based on the latent semantic analysis, finds out the central point of each cluster, and then calculates the similarity between the query vector and each cluster＇s central points for retrieval. The algorithm can solve the problem of time - consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. In view of the characteristics of document retrieval, it proposes a new method for calculating the feature weights. The results of the experiment show that the new algorithm can reduce the search time, and improve the retrieval efficiency.

作者和晓萍李迪王米利马学松周卫红

机构地区云南民族大学数学与计算机科学学院

出处《云南民族大学学报（自然科学版）》 CAS 2015年第3期257-260,共4页 Journal of Yunnan Minzu University:Natural Sciences Edition

基金国家民委科研项目(12YNZ008) 云南省教育厅科学研究基金(2012Y315) 云南民族大学青年基金(11QN08)

关键词潜在语义分析文献检索奇异值分解 latent semantic analysis document retrieval singular value decomposition k - means

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1吴丹,齐和庆.信息检索模型及其在跨语言信息检索中的应用进展[J].现代情报,2009,29(7):215-221. 被引量：7
2DUMAIS S T, FURNAS G W, LANDAUER T K, et al. Using latent semantic analysis to improve access to textual information [ C]//Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1988:281 -285.
3盖杰,王怡,武港山.基于潜在语义分析的信息检索[J].计算机工程,2004,30(2):58-60. 被引量：29
4余正涛,樊孝忠,郭剑毅,耿增民.基于潜在语义分析的汉语问答系统答案提取[J].计算机学报,2006,29(10):1889-1893. 被引量：44
5瞿琳琳.基于潜在语义分析的智能检索系统[D].上海:上海师范大学,2007:58-59.
6陈燕红,刘风华.一种改进的潜在语义检索模型研究[J].计算机技术与发展,2014,24(9):120-124. 被引量：2
7强保华,李巍,邹显春,汪天天,吴春明.基于潜在语义分析的Deep Web查询接口聚类研究[J].计算机科学,2013,40(11):228-230. 被引量：3
8YU Chun-li. Using latent semantic indexing for an online research interest matching system[ C ]. International Conference on Ad- vanced Information Engineering and Education Science. Atlantis Press, 2013:109 -112.
9陈磊,余建坤,邢晓宇.谱系聚类在综合国力分析中的应用[J].云南民族大学学报（自然科学版）,2009,18(1):85-88. 被引量：1
10宋涛,施水才,房祥,吕学强.基于改进的潜在语义分析的文本聚类[J].北京信息科技大学学报（自然科学版）,2012,27(3):21-25. 被引量：5

二级参考文献74

1陈苒,董占球.WWW信息搜索技术研究[J].计算机工程与应用,2001,37(14):62-64. 被引量：2
2吴丹.本体在信息检索中的作用及实例研究[J].情报杂志,2006,25(6):72-75. 被引量：12
3居斌.潜在语义标引在中文信息检索中的研究与实现[J].计算机工程,2007,33(5):193-196. 被引量：16
4ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH : An Efficient Data Clustering Method for very Large Database [ C ]//Proc of the ACM SIGMOD Int's Conf on Management of Data. Montreal Canada: ACM Press, 1996:83 -94.
5SANDER F, ESTER M, KRIEGEL H P. The Mgorithm GDBSCAN and its Applications [ J ]. Data Mining and Knowledge Dis- covery, 1998(2) :178 - 192.
6Mooers C.Application of random codes to the gathering of statistical information.M.S.Thesis.Massachusetts Institute of Technology,1948.
7Raeza-Yates R,Ribeiro-Nero B.Modern information retrieval.Massachusetts:Addison Wesley,1999.
8Wong S K M,Ziarko W,Wong P C N.Generalized vector space model in information retrieval,In:Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'85).Montreal,Canada,1985:18-25.
9Wilkinson R,Hingston P.Using the cosine measure in a neural network for document retrieval.In:Proceedings of 14th Annual International ACM SIGIR Cenference on Research and Development in Information Retrieval (SIGIR'91).Chicago,USA,1991:202-210.
10Turtle H,Croft W B.Evaluation of an inference network-based retrieval model.ACM Transactions on Information Systems,1991,9(3):187-222.

共引文献82

1焦玉英,刘伟成,孙吉红.基于向量空间模型的专题文献过滤算法研究[J].情报学报,2005,24(5):562-566. 被引量：3
2郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107. 被引量：16
3刘磊.基于潜在语义分析的JAVA类库检索方法[J].电脑开发与应用,2006,19(3):43-44.
4余正涛,樊孝忠,郭剑毅,耿增民.基于潜在语义分析的汉语问答系统答案提取[J].计算机学报,2006,29(10):1889-1893. 被引量：44
5李华云,金玉坚.基于层次搜索的潜在语义索引方法研究[J].图书情报工作,2006,50(11):36-38. 被引量：1
6刘海峰,王元元,张学仁.基于潜在语义空间的文本检索问题研究[J].情报科学,2007,25(5):748-753. 被引量：9
7陈明晶.潜在语义分析方法在主观题评判中的应用[J].浙江科技学院学报,2007,19(2):93-96. 被引量：2
8居斌.潜在语义标引在中文信息检索中的研究与实现[J].计算机工程,2007,33(5):193-196. 被引量：16
9孙海霞,成颖.潜在语义标引(LSI)研究综述[J].现代图书情报技术,2007(9):49-53. 被引量：6
10李媛媛,马永强.基于潜在语义索引的特征选择与权重改进若干关键问题的研究与实现[J].现代图书情报技术,2007(10):80-84. 被引量：1

同被引文献27

1Han J,Kamber M,Pei J.数据挖掘:概念与技术[M].第3版.范明,孟小峰译.北京:机械工业出版社,2012.
2Magerman T, Van Looy B, Song X. Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications [J]. Scientometrics, 2010, 82(2): 289-306.
3Wang W, Yu B. Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis [J]. Neural Computing & Application, 2009, 18(8): 875-881.
4Olmos R, Le6n J A, Jorge-Botana G, et al. New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis [J]. Behavior Research Methods, 2009, 41(3): 944-950.
5Law J, Bauin S, Courtial J P, et al. Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification [J]. Scientometrics, 1988, 14(3):251-264.
6任建华,沈炎彬,孟祥福,等.基于词条之间关联关系的文档聚类[J/OL].[2014-12-11].计算机工程与应用.http://WWW.cnki.net/kcms/detail/11,2127.TP,20141211,1528.053.html.
7Steyvers M, Griffith T. Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning [M]. Laurence Erlbaum, 2006.
8Landauer T K, Foltz P W, Laham D. An Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25(2-3): 259-284.
9Leydesdorff L. Similarity Measures, Author Cocitation Analysis, and Information Theory [J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
10Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. http://structureddynamics.com/linked_data.html.

引证文献1

1赵夷平,毕强.关联数据在学术资源网相似文献发现中的应用研究[J].现代图书情报技术,2016(3):41-49. 被引量：5

二级引证文献5

1许鑫,江燕青,翟姗姗.面向语义出版的学术期刊数字资源聚合研究[J].图书情报工作,2016,60(17):122-129. 被引量：16
2齐云飞,赵宇翔,朱庆华.关联数据在数字图书馆移动视觉搜索系统中的应用研究[J].数据分析与知识发现,2017,1(1):81-90. 被引量：19
3陈果,吴微,肖璐.知识共聚:领域分析视角下的知识聚合模式[J].图书情报工作,2018,62(8):115-122. 被引量：8
4张影.基于数据关联与文本挖掘技术的图书馆文献资源开发利用研究[J].中国中医药图书情报杂志,2019,43(4):48-51. 被引量：3
5王颖.学术资源挖掘方法研究综述[J].现代情报,2021,41(12):164-177. 被引量：2

1贺国旗,张强.基于用户模型的文献检索研究[J].雁北师范学院学报,2002,18(5):29-32. 被引量：1
2赵静,房正华.基于本体的无线网络文献检索研究[J].计算机光盘软件与应用,2014,17(10):85-86.
3石磊,覃冬梅,丁君军.利用ACM数据库CCS分类体系的文献检索研究[J].中国科技信息,2014(6):110-111.
4王睿,杨舒卉,张丽.基于潜在语义分析的电子文献检索方法及实验分析[J].海军工程大学学报（综合版）,2017,14(1):88-92. 被引量：2
5陈明晶.潜在语义分析方法在主观题评判中的应用[J].浙江科技学院学报,2007,19(2):93-96. 被引量：2
6张明,李丹.基于本体的文献检索研究[J].郑州大学学报（理学版）,2011,43(2):28-31. 被引量：1
7钱剑飞,陈华,陈奇,俞瑞钊.一种代码与中文文档关联信息的自动提取方法[J].浙江大学学报（工学版）,2004,38(11):1417-1421. 被引量：2
8陈华,钱剑飞,俞瑞钊.一种代码和中文文档关联信息的自动提取方法[J].计算机应用与软件,2005,22(9):48-49. 被引量：3
9卓佳,张俊坤,李畅.使用向量空间模型进行信息检索的实现[J].华南金融电脑,2008,16(10):44-47. 被引量：1
10段荣伟,付立冬,夏广锋.语义分析在水环境领域的应用研究[J].电子技术与软件工程,2015(22):263-264.

云南民族大学学报（自然科学版）

2015年第3期

浏览历史

内容加载中请稍等...

基于预聚类的潜在语义分析模型文献检索研究被引量：1

参考文献10

二级参考文献74

共引文献82

同被引文献27

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于预聚类的潜在语义分析模型文献检索研究 被引量：1

参考文献10

二级参考文献74

共引文献82

同被引文献27

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于预聚类的潜在语义分析模型文献检索研究被引量：1