期刊文献+

弱链接文档搜索引擎研究

Research on Weak-Linked Document in Search Engine
下载PDF
导出
摘要 聚类技术能将大规模数据按照数据的相似性划分成用户可迅速理解的簇,从而使用户更快地了解大量文档中所包含的内容。因此,聚类技术成为搜索引擎中不可或缺的部分和研究热点。Web上的AJAX应用和PowerPoint文件等弱链接文档由于缺乏足够的超链接信息,导致搜索该类文档时,排序结果不佳。针对该问题,给出一个弱链接文档的搜索引擎框架,并重点描述一个基于网页搜索结果的弱链接文档排序算法。基于聚类的弱链接文档排序算法利用聚类算法从高质量的网页搜索结果中提取与查询相关的主题,并根据主题的相关网页的排名确定该主题的重要性,根据识别的带权重的主题计算弱链接文档的排序值。实验结果表明该算法能够为弱链接文档产生较好的排序结果。 Clustering technology can partition a large number of documents into a small number of clusters according to document similarities.The generated clusters help people to understand documents quickly.Clustering technology plays an important role in SE and attracts a lot of interests from both industry and academic.The current search engine cannot rank well weak-linked docu ments such as PowerPoint files and AJAX applications.Current search engines return therefore either completely irrelevant results or poorly ranked documents when searching for these files. Proposes novel framework for correctly retrieving and Ranking weak-linked documents based on Clustering.The experiments show that our approach considerably improves the result quality of current search engines and that of latent semantic indexing.
作者 陈哲 魏衍君
出处 《现代计算机》 2013年第19期3-7,共5页 Modern Computer
关键词 搜索引擎 聚类技术 弱链接文档 Search Engine Clustering Technology Weak-Linked Document
  • 相关文献

参考文献6

  • 1Page L,Brin S,Motwani R,etal. The PageRank Citation Rank- ing: Bringing Order to the Web. Technical Report, Stanford Digital Library Technologies Project, Stanford University, Stanford,CA,USA,November, 1998.
  • 2Kleinberg J M. Authoritative Sources in a Hyperlinked Envi- ronment. J.ACM, 1999,46 (5) :604632.
  • 3Duda C,Frey G,Kossmann D,et al. AJAX Crawl: Making AJAX Applications Searchable. In:Proceedings of the 25nd Interna- tional Conference on Data Engineering(ICDE'09),2009:78- 89.
  • 4Duda C,Frey G,Kossmann D,etal. AJAX Search: Crawling, In- dexing and Searching Web 2.0 Applications. Proc.VLDB En- dow,2008,1 (2):14401443.
  • 5Aslam J A,Pelekhov E,Rus D. The Star Clustering Algorithm for Static and Dynamic Information Organization.J. Graph AI- gorithma Appl,2004,8:95-129.
  • 6iProspect Search Engine User Behavior Study. http://www. iprospect.com.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部