摘要
由于通用搜索引擎检索返回的结果过多、主题相关性不强以及随着人们对提供的各项信息服务的要求越来越高,基于整个Web的信息采集越来越力不从心。同时它无法及时地采集到足够的最新的Web信息,也不能满足人们日益增长的个性化需求。本文通过把Internet化学资源导航系统所积累的化学知识与搜索引擎的自动采集技术相结合展开了对化学主题网络爬虫开发的研究。结果表明,基于Widrow-Hoff分类器的化学主题网络爬虫能有效的采集化学相关的网页。
The popularity of Web has been growing rapidly in the last few years.However,faced with people's requirements more and more rigorous and prolific,general search engine still can't satisfy personal need accurately. Based on combination of information accumulated in Internet navigator of chemical resources and automatic collection of web crawler,thls article brings forward a structure design model of chemistry focused web crawler based on Widrow-Hoff classifier and verifies its ability.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第10期204-205,229,共3页
Computer Engineering and Applications
基金
国家自然科学基金资助项目(编号:20273076)