摘要
第一部分设计全文检索系统,系统由三大功能模块组成:索引模块、检索模块和存储模块。第二部分着重分析PDF数据转换,XML文档设计,索引的分词、建立及效率等技术难点,并对中文分词分析器、索引文件膨胀率、索引影响因子进行测试,在此基础上设计全文检索系统并对检索响应时间进行测试。在结论中指出应关注XML数据库的安全性。
The paper includes three parts. The first part designs a full text retrieval system, including index module, retrieval module and database module. The second part mainly analyzes how to transform PDF to XML, design of XML database, Chinese word segmentation, founding and efficien- cy of indexes etc. On the basis of the above, the paper designs a full text retrieval system and makes a test on the responding time of searching, points out that we should pay attention to the security of XML database.
出处
《图书馆杂志》
CSSCI
北大核心
2009年第8期63-67,共5页
Library Journal