期刊文献+

面向主题的垂直搜索引擎系统的研究与实现 被引量:10

Research and Implementation of Subject-Oriented Vertical Search Engine System
下载PDF
导出
摘要 针对通用搜索引擎的信息量大、查询不准确、深度不够等问题,给出了面向主题的垂直搜索引擎的体系结构,设计了垂直搜索引擎系统的爬行策略,对系统核心信息采集模块运用了多线程技术及基于VSM的主题相关度判断算法进行主题网页爬行,并通过Lucene.Net的索引与检索技术建立系统的检索算法,实现了一个面向特定主题的垂直搜索引擎应用系统.实验测试结果表明,该系统具有较高的提取效率,其检索的准确率、召回率均大大高于通用搜索引擎,具有较好的实用价值和商业应用前景. A general search engine usually suffers from returning to users too much unrelated information.To avoid these problems of general search engine,a subject-oriented vertical search engine is proposed in this paper followed by the introduction of the crawling strategy and the architecture of this vertical search engine.Some distinctive characteristics of this search engine include: the system runs its core modules on multiples threads;a topic-related determination algorithm based on VSM is implemented in the system.The search engine system as well as its core algorithms is built upon lucene.Net.Experimental results show that the accuracy and recall rate of this system are higher than general search engine.Therefore,it is practically valuable and can be applied into more real applications.
出处 《微电子学与计算机》 CSCD 北大核心 2011年第7期1-4,8,共5页 Microelectronics & Computer
基金 国家自然科学基金项目(61003001 71071098) 江苏省自然科学基金项目(BK2010280) 南通市科技计划项目(K2008018 K2008031)
关键词 垂直搜索 网络爬虫 LUCENE .Net 正则表达式 vertical search web crawler lucene.net regular expression
  • 相关文献

参考文献6

二级参考文献39

共引文献56

同被引文献58

  • 1胡涛,路红英.基于Nutch的搜索引擎的研究[J].计算机时代,2007(1):57-59. 被引量:16
  • 2罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 3陈刚,卢炎生.BBS搜索引擎设计与实现[J].微计算机信息,2006,22(06X):34-36. 被引量:4
  • 4中国互联网络发展状况报告[EB/OL].http://www.cnnic.net/index/Oh/manual/91/index.htm.2006-1-20.
  • 5刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量:130
  • 6KIM Yarn Sok, KANG Byeong Ho, COMPTON P, et al. Search engine retrieval of changing information[C]//Pro- eeedings of International Conference on World Wide Web, New York: ACM,2007:1195-1196.
  • 7Bussche F.Not so creepy crawler: easy crawler generation with standard XML queries[C]//Proceedings of the 19th International Conference on WWW,Raleigh,North Car- olina,USA,2010: 1305-1308.
  • 8Patel A.An adaptive updating topic specific web search system using T-Graph[J].Journal of Computer Science, 2010,6(4) : 450-456.
  • 9Barbosa L, Freire J, Taylor R C.An adaptive crawler for locating hidden web entry points[C]//Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 2009: 681-697.
  • 10Menczer F, Pant G, Srinivasan P. Evaluating topic - driven Web crawlers. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01 ) ,2001.

引证文献10

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部