摘要
互联网搜索的精确性一直是衡量搜索引擎性能的重要标志。针对普通搜索引擎的固有缺陷,文中提出了一种应用于新闻检索的搜索引擎。该引擎是利用开源的网络爬虫工具将互联网信息抓取到本地,并利用Lucene开放的API,对特定的信息进行索引和搜索。Lucene是基于Java开发的源代码开放的全文检索工具包,具有高性能、可扩展等特性,是实现搜索引擎的核心组件。通过对Lucene的API进行分析,并在此基础上,构建了索引和搜索的模块,并对网上新闻内容进行实时地搜索。通过与普通搜索引擎对比,该新闻搜索引擎提高了搜索的精确性。
The precision of Internet searching is important signs of weighing the performance of search engine. In order to resolve the inherent vice about the general search engines, present a search engine applied in news search, which uses the web spider to fetch the information to local machine. The search engine also uses the open API of Lucene to index and search the special information. Lucene is a high -performance, extensible full text search kit based on Java,it is the core component for the realization of the search engine. Give an analy- sis of the API of Lucene. And on this basis,construct the index and search module,then search the news on the web with real time. By comparing with the general search engine, the news search engine improves accuracy in searching.
出处
《计算机技术与发展》
2013年第6期230-232,共3页
Computer Technology and Development
基金
国家自然科学基金资助项目(60473092)