期刊文献+

基于Nutch的XML网站全文搜索引擎实现 被引量:5

Implementation of XML Website Complete Text Search Engine Based on Nutch
下载PDF
导出
摘要 普通搜索引擎的网页抓取程序只能理解常见HTML标签,无法对XML网站的内容做有效解析。该文建立一个包含动态自定义标签的纯XML网站,提出借助XSL样式信息帮助网页抓取程序理解XML网页标签含义的方案,实现了基于Nutch的XML网站全文搜索引擎。 General search engine spiders can understand only common HTML tags, and can't parser information from XML Web sites efficiently. This paper proposes a strategy of using XSL to help spiders to understand the structure of XML pages. Based on this strategy, a pure XML Website is set up, and a search engine based on Nutch which is able to parse XML Website content correctly is realized.
出处 《计算机工程》 CAS CSCD 北大核心 2008年第15期95-96,107,共3页 Computer Engineering
关键词 XML信息检索 可扩展样式表语言转换 基于Nutch的搜索引擎 XML information retrieval eXtensible Stylesheet Language Transformations(XSLT) search engine based on Nutch
  • 相关文献

参考文献5

  • 1Trotman A, Geva S. Relevance in XML Retrieval: The User Perspective[C]//Proceedings of the SIGIR Conference on XML Element Retrieval Methodology. Washington, Seattle, USA: ACM Press, 2006.
  • 2Kamps J, Marx M, Rijke M D, et al. Structured Queries in XML Retrieval[C]//Proceedings of the 14th ACM Conference on Information and Knowledge Management, [S. l.]: ACM Press, 2005,
  • 3Kamps J, Marx M, Rijke M D, et al. Best-match Querying from Document-centric XML[C]//Proceedings of the 7th International Workshop on the Web and Databases. New York, USA: ACM Press, 2004.
  • 4Cafarella M, Cutting D. Building Nutch: Open Source Search[Z]. 2004.
  • 5韩毅.基于DTD的XML文档内容检索研究[J].情报科学,2006,24(3):409-412. 被引量:1

二级参考文献7

同被引文献30

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部