期刊文献+

基于文档重要度的静态索引剪枝方法 被引量:1

Static Index Pruning Based on Document Importance
下载PDF
导出
摘要 针对网页质量参差不齐、重要程度差别巨大的问题,提出了按照网页重要程度确定其剪枝幅度的静态索引剪枝方法,并在GOV2数据集上进行了验证.实验结果表明:这种方法体现了静态索引剪枝能极大降低存储需求、提高查询效率的优点;当剪枝后的索引大小是原始大小的13%时,P@10、P@20值能达到甚至超过使用完整索引时的结果;在相同的剪枝幅度下,P@10、P@20和MAP都明显好于以往的剪枝方法. As the quality and importance of Web pages are both variable,this paper proposes a static index pruning method which uses the web page importance to determine the ratio of information kept for each document.The result of experiments on GOV2 dataset show that(1) the proposed method greatly reduces the storage size and speeds up the search;(2) when the pruned index takes only 13% of the original size,P@10 and P@20 reach or exceed the baseline using full index;and(3) by using the proposed method,P@10,P@20 and MAP are all better than those of the traditional method at the same pruning level.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2011年第4期1-6,共6页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(60933004) 广东省计算机网络重点实验室资助项目(CCNL200601) "核心电子器件 高端通用芯片及基础软件产品"国家科技重大专项项目(2011ZX01042-001-001)
关键词 搜索引擎 倒排索引 静态索引剪枝 文档重要度 search engine inverted index static index pruning document importance
  • 相关文献

参考文献14

  • 1李晓明.对中国曾有过静态网页数的一种估计[J].北京大学学报(自然科学版),2003,39(3):394-398. 被引量:12
  • 2李晓明,闫宏飞,王继民.搜索引擎-原理、技术与系统[M].北京:科学出版社,2010:130.
  • 3Carmel D, Cohen D. Static index pruning for information retrieval systems [C] //Proceeding of the 24th Annual International ACM SIGIR Conterence on Research and Development in Information Retrieval. New York :ACM ,2001 : 43-50.
  • 4BOttcher S, Clarke C. A document-centric approach to static index pruning in text retrieval systems [ C ]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management. New York : ACM, 2006 : 182-190.
  • 5Nguyen L T. Static index pruning for information retrieval system:a posting-based approach [ C ]//7th Workshop on Large-Scale Distributed Systems for Information Retrieval. New York : ACM ,2009:25-32.
  • 6De Moura E S, Dos Santos C F, Fernandes D R, et al. Im- proving web search efficiency via a locality based static pruning method [ C ] // Proceedings of the 14th International Conference on World Wide Web. New York:ACM, 2005 : 235- 244.
  • 7Altingovde I S, Ozcan R, Ulusoy O. Exploiting query views for static index pruning in web search engines [ C]//Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 1951- 1954.
  • 8Persin M,Zobel J, Sacks-Davis R. Filtered document retrieval with frequency-sorted indexes [ J ]. Journal of the American Society for Information Science, 1996,47 : 749- 764.
  • 9Anh V N, Moffat A. Pruned query evaluation using precomputed impacts [C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2006:372-379.
  • 10Zhang F, Shi S, Yan H, et at. Revisiting globally sorted indexes for efficient document retrieval [ C ] // Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York: ACM,2010: 371- 380.

二级参考文献9

  • 1李晓明,刘建国.搜索引擎技术及趋势[J].中国计算机用户,2000(9):27-28. 被引量:14
  • 2祝福来.北大天网发布2002年中国网页调查报告[N].计算机世界,2003-01-27,A6版.
  • 3Reka Albert, Hawoong Jeong; Albert-Laszlo Barabasi.Diameter of the World-Wide Web. Nature, 1999,401 (9) : 130-131.
  • 4Bernardo A Huberman, Lada A Adamic. Growth Dynamics of the World-Wide Web. Nature, 1999,401 (9) : 131.
  • 5Gary William Flake, Steve Lawrence, C Lee Giles, et al. Self-organization and Identification of Web Communities. Computer,IEEE Computer Society,2002,35(3) :66 - 71.
  • 6Yan Hongfei, Wang Jianyong, Li Xiaoming, et al. Architectural Design and Evaluation of an Efficient Web-crawling System.Journal of System and Software ,2002,60(3) :185 - 193.
  • 7Junghoo Cho,Hector Gareia-Molina.The Evolution of the Web and Implications for an Incremental Crawler. In: Proceedings of 26^th International Conference on Very Large Databases(VLDB),Cairo,Egypt,September,2000,1 - 18.
  • 8Peter Pirolli,james Pitkow,Ramana Rao.Silk from a Sow's Ear: Extracting Usable Structures from the Web. In: Proc ACM Conf Human Factors in Computing Systems, New York: ACM Press,1996,118-125.
  • 9中国互联网络信息中心.中国互联网信息资源数量调查报告.hup.//www.cnnic.gov.cn,2001--04.

共引文献11

同被引文献14

  • 1Dean J. Challenges in building large-scale information retrieval systems: Invited talk [C] //Proeeedings of the Second ACM International Conference on Web Seareh and Data Mining. ACM, 2009: 1.
  • 2Jonassen S, Bratsberg SE. Efficient compressed inverted index skipping for disjunctive text-queries [M]. Advances in Infor- mation Retrieval. Berlin: Springer Berlin Heidelberg, 2011: 530-542.
  • 3Dimopoulos C, Nepomnyaehiy S, Suel T. Optimizing top-k document retrieval strategies for block-max indexes [C] // Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM, 2013: 113-122.
  • 4Rossi C, De Moura ES, Carvalho AL, et al. Fast doeumen- tat-a-time query proeessing using two-tier indexes [C] //Pro- ceedings of the 36th Intemational ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 183-192.
  • 5Tonellotto N, Macdonald C, Ounis I. Efficient dynamic pru- ning with proximity support [C] //Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, 2010: 31-35.
  • 6Chakrabarti K, Chaudhuri S, Ganti V. Interval-based pruning for top-k processing over compressed lists [C] //IEEE 27th Internatio- nal Conference on Data Engineering. IEEE, 2011: 709-720.
  • 7Wang L, Lin J, Metzler D. A easeade ranking model for effi- cient ranked retrieval [C] //Proeeedings of the 34th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011: 105-114.
  • 8Lacour P, Macdonald C, Ounis I. Efficiency comparison of document matching techniques [C] //Effieieney Issues in In- formation Retrieval Workshop: European Conference for Infor-mation Retrieval, 2008: 37-46.
  • 9Fontoura M, Josifovski V, Liu J, et al. Evaluation strate- gies for top-k queries over memory-resident inverted indexes [J]. Proceedings of the VLDB Endowment, 2011, 4 (12): 1213-1224.
  • 10Yah H, Ding S, Suel T. Inverted index compression and que- ry processing with optimized document ordering [C] //Pro- ceedings of the 18th International Conference on World Wide Web. ACM, 2009: 401-410.

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部