期刊文献+

利用关键词倒排表实时检索中文网页 被引量:4

Real-time retrieval in Chinese webpage by using inverted table
下载PDF
导出
摘要 研究了基于关键词倒排表的中文网页快速检索方法。在建立大量网页语料库的前提下,利用关键词词典和优化后的前向最大切词算法脱机生成网页关键词特征向量,然后对网页特征向量作维数压缩生成压缩格式的网页特征表,最后利用网页特征表根据关键词在所有网页中出现的频率统计生成关键词倒排文件。实验中,通过对比访问网页库、特征表和倒排文件三种不同的数据来源,分别实现了中文网页的关键词检索,比较了三种数据源检索的实时性。实验表明,基于关键词的倒排表检索算法大大优于其他两种方法,具有很好的实时性。 The paper studies fast retrieval technique of Chinese webpage based on inverted Keywords.Under the premise of establishing a large of webpage corpus,the webpage keyword feature vectors are generated by using the keyword dictionary and the optimized forward largest segmentation algorithm in the status of offline.Then a compressed format of the webpage feature table is produced by dimension reducing on the feature vectors.Finally,an inverted keyword file is established according to the frequency of the keywords reference in all of the webpage and the webpage feature table.In the experiment,by contrastively accessing three data sources,namely the original webpage database,the feature table and the inverted file,the retrievals of the Chinese webpage keywords are implemented respectively,and comparison of the three retrieval methods are given on testing the real-time ability.The experiment shows that,the inverted file retrieval algorithm based on keywords is enormously superior on real-time to the other two methods.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第28期135-137,159,共4页 Computer Engineering and Applications
基金 江苏省自然科学基金No.BK20080544~~
关键词 检索 网页特征表 倒排文件 实时性 retrieval webpage feature table inverted file real-time
  • 相关文献

参考文献7

二级参考文献27

共引文献47

同被引文献20

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部