摘要
为实现可以为单独的网站提供站内全文检索系统,采用独立于专门商业搜索引擎的lucene索引结构,在网站所有的文档基础上,首先使用xml转换技术建立文档的xml索引格式,抽取文档主题内容写入xml链表,索引过程基于伪xml的存储布局。该方法在检索效率和准确性上都有很明显的提高,并且扩展性好,在原来解析链的结构下可以直接增加新的解析模块。
To realize the website provides for standing alone fulltext retrieval system, using in independent specialized business search engine index structure on the website, lucene all documents, first using XML based on the establishment of an XML document conversion technology index, extracting document formatting XML topic content into a table, index based on chain process of storage pseudo XML layout. This method in retrieval efficiency and accuracy are obviously improved and expansibility, in the original structure of analytical chain can directly increase new analytical module.
作者
张东振
张明
ZHANG Dong-zhen, ZHANG Ming (School of Information Engineering College, SMU., Shanghai 200911, China)
出处
《电脑知识与技术》
2010年第01Z期400-402,共3页
Computer Knowledge and Technology