期刊文献+

基于主题语义URL的信息搜索方法研究 被引量:2

RESEARCH ON TOPIC SEMANTICS URL-BASED INFORMATION SEARCH METHOD
下载PDF
导出
摘要 为提高主题网络爬虫的效率及收获率,提出一种基于主题语义URL的信息搜索方法。该方法将种子URL映射到主题树的主题结点上,以主题路径上的主题文本扩充种子URL的语义,引导爬虫高效准确地抓取主题页面,并利用链接重要度与页面重要度因子在抓取过程中自动选育新的URL优良种子。重点阐述上述搜索方法的原理及其在系统中的实现。实验结果表明,该搜索方法能有效改善网络爬虫的搜索效率及收获率,且种子链接的选育性能良好。 This paper presents a topic semantics URL-based information search method for improving the efficiency and harvest ratio of topic networks crawler.The method maps the seed URL onto the topic nodes of topic tree,and expands the semantics of seed URL by using the topic text on topic path as well as guides the crawler to efficiently and precisely crawl the topic pages.Furthermore,it makes use of the factors of link importance and page importance to automatically select and breed new URL seeds during the crawling process.The paper emphatically elucidates the principle of the search method above mentioned and its realisation in the system.Experimental results demonstrate that this method can effectively improve the search efficiency and harvest ratio of network crawlers,and the selection and breeding performance of seeds link is excellent as well.
作者 林晶 彭小宁
出处 《计算机应用与软件》 CSCD 2015年第6期42-45,共4页 Computer Applications and Software
基金 湖南省教育厅科研项目(10C1064) 怀化学院科研项目(HHUY2010-18) 怀化学院重点学科建设项目
关键词 主题树 URL语义 搜索引擎 主题-URL映射 Topic tree URL semantics Search engine Topic-URL mapping
  • 相关文献

参考文献6

  • 1林海霞,原福永,陈金森,刘俊峰.一种改进的主题网络蜘蛛搜索算法[J].计算机工程与应用,2007,43(10):174-176. 被引量:18
  • 2余旸,Zhangxi LIN,夏国平.基于链接结构分析的主题搜索[J].北京工业大学学报,2011,37(4):614-618. 被引量:2
  • 3Pant,F Menczer.Topical Crawling for Business Intelligence[C]//T Koch and I Solvberg.Proc.7th European Conference on Research and Advanced Technology for Digital Libraries(ECDL),series Lecture Notes in Computer Science,Vol.2769.Berlin,2003.
  • 4Aggarwal C,AL-Garawi F,Yu S P.Intelligent crawling on the world wide web with arbitrary Predicate[C]//Hong Kong:Proc of the 10th International World Wide Web Conference,2001.
  • 5Menczer,G Pant,P Srinivasan.Topical Web Crawlers:Evaluating Adaptive Algorithms[J].ACM Transactions on Internet Technology,2004,4(4):378-419.
  • 6Chen Huei Liao,Bor Chen Kuo Kai Chih Pai.Effectiveness of Automated Chinese Sentence Scoring with Latent semantic Analysis[J].The Turkish Online Journal of Educational Technology,2012,11(2):80-87.

二级参考文献17

  • 1MARKUS K, DENNY V, MAX V. Wikipedia and the semantic Web: the missing links[ C]//Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press ( ACM), 2005 : 117-125.
  • 2MAX V, MARKUS K, DENNY Vrandecic, et al. Semantic Wikipedia[ C]//WWW2006. Edinburgh, Scotland: Association for Computing Machinery Press ( ACM), 2005 : 265-274.
  • 3DAVID A. SHAWN : structure helps a Wiki navigate[ C ]//BTW Workshop WebDB Meets IR. Arlington : AAAI Press, 2005 : 97-108.
  • 4NATALIA K. Automatic ontology extraction for document classification[ D]. Saarbrticken, Germany: Max-Plank-Institute for Computer Science, Saarland University, 2006.
  • 5DANIEL K. Wikisense-mining the Wiki [ C ]//Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press ( ACM), 2005 : 254-276.
  • 6CHAKRABARTI S. Data mining for hypertext : a tutorial survey [ C ]//SIGKDD Explorations. Cambridge : MIT Press, 2000 : 113-125.
  • 7JAKOB V. Measuring Wikipedia[ C ] // ISSI 2005. Stockholm, Sweden : Karolinska University Press, 2005 : 21-36.
  • 8FRANCESCO B, ROBERTO B. Network analisis for Wikipedia [ C ] //Wikimania 2005. Frankfurt am Main, Germany: Association for Computing Machinery Press (ACM) , 2005: 334-367.
  • 9SERGEY B, LAWRENCE P. The anatomy of a large-scale hypertextual Web search engine[ J]. Computer Networks and ISDN Systems, 1998, 30(1/7) : 107-1t7.
  • 10JON K. Authoritative sources in a hyperlinked environment, B.l 10076[ R]. New York: IBM, 1997.

共引文献18

同被引文献12

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部