摘要
网站黄页系统是一个自动生成网站黄页目录并以此为基础为用户提供一系列服务的系统。它通过快速收集网络上的教育资源,并自动化地对其进行高质量的分类和信息抽取,形成教育网站黄页,为用户提供浏览、检索等服务。未经过二次开发的黄页系统检索的准确性普遍较低,不适合校园网络的使用.针对普通搜索引擎的固有缺陷,提出了一种应用于新闻检索的搜索引擎,该引擎是利用开源的网络爬虫工具将互联网信息抓取到本地,并利用Lucene开放的API,对特定的信息进行索引和搜索。
Yellow page is a system that can automatically generate a directory of network to serve for users. Through rapid collection education resources on the network and high-quality automatic classification and information extraction,it generates a website directory to provide users with browsing, retrieving and other services.The search accuracy of yellow page without second development are generally lower, so it is unsuited to be used in campus network.In order to resolve the inherent vice about the general search engines, present a search engine applied in news search, which uses the web spider to fetch the information to local host. The search engine also uses the open API of Lucene to index and search the special information.
出处
《电脑开发与应用》
2014年第8期14-17,共4页
Computer Development & Applications
关键词
校园网
搜索引擎
黄页系统
网络爬虫
Lucene
campus network
Lucene
search engine
yellow page
web spider