摘要
通过设计一个藏文图书查询系统,并给出该系统的总体结构图,对基于XML的搜索引擎中涉及的关键技术进行了研究,提出对XML这种半结构化文档建立索引和查询时采用的数据结构和算法.它在不丢失文档中结构信息的情况下,能充分利用XML的标签所带来的上下文信息,能够大幅度提高查询的准确率.
The shortage of traditional search engine based on HTML and the advantage of the XML technology was analyzed in the thesis. We designed a retrieve system of Tibetan language books and gave the overall structural diagram of this system. Several key techniques inside XML search engine were also researched. It utilizes the context information in XML document to promote the ratio of accuracy of query. It discussed the details of spider technique and structure of the index file. It saved the hierarchy relation of tag with a tow storing overhead.
出处
《西北民族大学学报(自然科学版)》
2005年第4期53-58,共6页
Journal of Northwest Minzu University(Natural Science)
关键词
XML
检索系统
总体结构图
倒排表
XML
inspect the system
overall structural diagram
inverted file