期刊文献+

基于链接和内容的BLCT主题爬行算法研究 被引量:1

Research on topic crawling algorithm based on link and content
下载PDF
导出
摘要 为了高效地获取与主题相关的资源,就垂直搜索引擎展开了研究。首先,在现有的PageRank算法基础上,提出一种改进的PageRank算法来测量网页的链接相似度;其次,从单个网页考虑,利用每个网页的url、title和正文,给出基于内容的相似度的计算方法;最后结合内容相似度和链接相似度,提出了一种基于链接和内容的BLCT主题爬行算法。实验结果表明,该算法在平均收获率和目标召回率上有显著提高,爬行的网页主题相关性也提高了。 This paper studied the method of vertical search engine to obtain the resources related with the tile effectively.Firstly,proposed the improved PageRank algorithm to measure the link similarity of the page.Secondly,put forward the similarity based on the content by using the url,title and text of each page.Finally,proposed BLCT topic crawling algorithm based on link and content by combining content similarity with link similarity.The experimental results show that proposed algorithm performs better in the average harvest rate and target recall rate,and the crawled pages relevant to the topic is more than the previous algorithm.
作者 王宏艳
出处 《计算机应用研究》 CSCD 北大核心 2011年第2期495-497,528,共4页 Application Research of Computers
关键词 垂直搜索引擎 PAGERANK算法 主题爬行 链接相似度 内容相似度 vertical search engine PageRank algorithm topic crawling link similarity content similarity
  • 相关文献

参考文献8

  • 1FROST P. Building better search engines[J].Computing in Science & Engineering,2007,9(4):7-11.
  • 2CHAU M, CHEN H. A machine learning approach to Web page filtering using content and structure analysis[J].Decision Support Systems,2007,44(2):482-494.
  • 3刘畅.综合搜索引擎与垂直搜索引擎的比较研究[J].情报科学,2007,25(1):97-102. 被引量:49
  • 4MENCZER F. Complementing search engines with online Web mining agents[J].Decision Support Systems,2003,35(2):1952-1960.
  • 5欧阳柳波,李学勇,李国徽,王鑫.专业搜索引擎搜索策略综述[J].计算机工程,2004,30(13):32-33. 被引量:34
  • 6CHAKRABARTI S, BERG M, DOM B. Focused crawling: a new approach to topic specific Web resource discovery[J].Computer Networks,1999(31):1623-1640.
  • 7林欢欢,庄福振,王文杰,史忠植.一种新型网络信息采集器的研究[J].计算机仿真,2009,26(5):129-133. 被引量:3
  • 8HERSEOVICI M, JACOV M, MAAREK Y S. The Shark-search algorithman application:tailored Web site mapping[J].Computer Networks and ISDN Systems,1998(30):317-326.

二级参考文献26

  • 1陈新颜.垂直搜索引擎辨析[J].现代情报,2004,24(9):133-134. 被引量:24
  • 2黄建莲.中国搜索引擎服务市场的现状及发展[J].华北科技学院学报,2005,2(3):113-115. 被引量:8
  • 3J Cho, L Garcia, H Molina, L Page. Efficient crawling through URL ordering[ J]. Computer Networks, 1998,30 (127) : 161 - 172.
  • 4F Menczer. Complementing Search Engines with Online Web Mining Agents[J]. Decision Support Systems, 2003,35 (2) : 195 - 212.
  • 5M Diligenti, et al. Focused Crawling Using Context graphs[ C]. ln:Proc, of the International Conference on Very Large Database (VLDB00), 2000.
  • 6K Bharat, M Henzinger. Improved algorithms for topic distillation in a hyperlinked environment [ C ]. In : Voorhees E, et al. , eds. Proceedings of the 21st ACM- SIGIR International Conference on Research and Development in Information Retrieval. Melbourne: ACM Press, 1998. 104 - 111.
  • 7X Wang, H Wu, L Wei, A Zhou. A similarity - based analysis model for topic distillation [ J ]. International Journal of Computational Intelligence and Application, 2002,2 ( 3 ) :267 - 275.
  • 8F Menczer. Complementing Search Engines with Online Web Mining Agents[ J ]. Decision Support Systems, 2003,35 ( 2 ) : 195 - 212.
  • 9Jiewen Luo, Zhongzhi Shi. Eliminate Redundancy in Parallel Search : A Multi - agent Coordination Approach [ C ]. LNAI4099, PRICAI 2006.91 - 100.
  • 10H Chen, C Schuffels, R Orwig. lnternet Categorization and Search: A Self - Orgenizing Approach [ J ]. Journal of Visual Communication and Image Representation, 1996,7 ( 1 ) : 88 - 102.

共引文献82

同被引文献6

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部