摘要
本文给出一种以词语为索引项的索引文件存储结构,以及基于这种结构的索引查询算法.首先分析中文索引库的分布规律,接着在此基础上设计了一种逆序存储的三层索引结构,这种结构在创建索引时能根据词语频率自动调整存储顺序,最后给出一种基于自动机和逆向最大匹配的索引查询算法.实验系统T IFS将三层索引结构与B树、哈希方法在时间和空间复杂度方面进行对比,结果表明,对于大规模的中文文本检索,三层索引结构的综合效果最好.
This paper presents a word-based-indexing file structure that applies to Chinese text,and an indexing search algorithm using this new structure. The distribution of the Chinese index corpus is analyzed firstly. Then a new three-layer indexing structure is presented, which is stored reversely and can adjust term order to its frequency. An index searching algorithm is also proposed,which is based on DFA and reverse maximum matching method (RMM). The experimental results in TIFS show that the new structure provides an effective way to search Chinese terms,compared with B-tree and hash table.
出处
《小型微型计算机系统》
CSCD
北大核心
2007年第7期1314-1317,共4页
Journal of Chinese Computer Systems
基金
辽宁省自然科学基金资助项目(2004D110)资助.
关键词
三层索引结构
汉语索引
信息检索
自适应算法
three-layer indexing structure
Chinese indexing
information retrieval
self-adapted algorithm