摘要
随着局域网信息的海量增长,个性化的轻量级搜索引擎已经被中、小型企业和校园关注和青睐。本文在研究搜索引擎基本原理的基础上,通过Lucene、JSP和Struts2等技术实现多种类型文件的文本内容的检索功能。测试结果表明,该系统实现了局域网内部对HTML、PDF、Word、txt等格式文件的内容提取和解析,具有开放性、可扩展、实时性和安全的特点,成功达到了预期目标。
With the increase of information of LAN,personalization and lightweight search engine has been concerned and admired.This paper realizes the retrieval of multi-type content using Lucene,JSP,struts2 etc,after studying of the principle of search engine on local area network.Experiment proves that the system can extract and analyze text of HTML,PDF,Word,txt,besides,the system is open,extended,real-time and safe.It achieves the anticipated results successfully.
出处
《计算机与现代化》
2011年第9期40-42,45,共4页
Computer and Modernization
基金
咸阳师范学院科研项目(07XSYK267)