期刊文献+

化工行业信息搜索技术的研究

Research on Chemical Industry Information Search Technology
下载PDF
导出
摘要 面向行业主题的搜索在特定主题信息覆盖方面与通用搜索引擎有着截然不同的要求,为解决行业信息搜索的问题对基于向量空间算法的化工相关度计算以及对经典的Page-Rank页面排序算法做了研究与改进并且在Nutch搜索引擎架构基础上,搭建了一个面向化工行业信息资源的垂直搜索引擎。相对于通用搜索引擎来说剔除掉了不必要的搜索结果信息量,提升了系统速度,提高了行业信息搜索的准确度。 The demand between the general search engine and the professional information search is mainly on the coverage of special topic information is huge different, to solving the problem which the professional information searching encountered, this paper study and give improvement on the chemical industry topic co-relation value computation based on the vector-space algorithm and the classic webpage ranking algorithm of Page-Rank, and build a vertical search engine based on the framework of Nutch. Compared to the general search engine, eliminating the unnecessary search results, improving the search system speed and the accuracy of professional information search.
出处 《四川理工学院学报(自然科学版)》 CAS 2011年第1期71-73,共3页 Journal of Sichuan University of Science & Engineering(Natural Science Edition)
基金 四川理工学院人才引进科研启动项目(07ZR41)
关键词 NUTCH 主题过滤 页面排序 Nutch topic distillation webpage ranking
  • 相关文献

参考文献9

  • 1Doug Cutting. Intemet Archive,Nutch: an Open-Source Platform for Web Search [EB/OL].httpJ/wiki.apache.org/ nutch/NutchTutorial,2006-08-02/2006-12-03.
  • 2欧阳柳波,李学勇,李国徽,王鑫.专业搜索引擎搜索策略综述[J].计算机工程,2004,30(13):32-33. 被引量:34
  • 3吴敏琦,丁岳伟.基于Nutch的XML网站全文搜索引擎实现[J].计算机工程,2008,34(15):95-96. 被引量:5
  • 4Page L, Brin S,Motwani R, et al.The PageRank citation ranldng: Bringing order to the Web[C]//Google technical report Conference, Stanford Infolab,june. 1998,2:96- 98.
  • 5Taher H. Haveliwala. Efficient computing of PageRank [C]//Stanford Database C_coup Technical Report, Stanford Infolab,Jan.1999,5 :101-104.
  • 6Bharat K,Hen25nger M R.Improved Algorithmsfor Topic Distillation in a Hyperlinked Envimnment[C]//In Proc. of {SIGtR}-98;21st {ACM} International Conference on Research and Development in Information Retrieval, 1998,7:326-327.
  • 7Chakrabarti S,Dom B,Gioson D,et al.Experiments in topic distillation [C]//Proc ACM SIG/R work shop on Hy- pertext Inforrmtion Retrieval on the Web,1998,9:1056- 1058.
  • 8吴明礼,施水才.一种结合超链接分析的搜索引擎排序方法[J].计算机工程,2004,30(15):143-145. 被引量:10
  • 9刘强国 左志宏 董祥千.基于web超链接分析算法研究综述.计算机应用研究,2007,(7):1324-1325.

二级参考文献19

  • 1韩毅.基于DTD的XML文档内容检索研究[J].情报科学,2006,24(3):409-412. 被引量:1
  • 2[1]Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. New York: ACM Press, 1999
  • 3[2]Algorithms. Englewood Cliffs, New Jersy: Prentice Hall, 1992
  • 4[3]Brin S, Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th World-wide Web(WWW7) 会议论文, 1998
  • 5[4]Page L, Brin S. The PageRank Citation Ranking:Bringing Order to the Web. Technical Report, Stanford Universitv, 1998
  • 6Menczer F. Complementing Search Engines with Online Web Mining Agents[J]. Decision Support Systems, 2003, 35(2): 195-212
  • 7Bra D P, Houben G, Kornatzky et al. Information Retrieval in Distributed Hypertexts[C]. In: Proc. of the 4th RIAO Conference,1994
  • 8Hersovici M, Heydon A, Mitzenmacher M, et al. The Shark-search Algorithm-An Application: Tailored Web Site Mapping[C]. In: Proc.of the World-Wide Web Conference, 1998
  • 9Cho J, Garcia-Molina H, Page L. Efficient Crawling Through URL Ordering[J]. Computer Networks, 1998, 30(1-7): 161- 172
  • 10Rennie J, McCallum A. Using Reinforcement Learning to Spider the Web Efficiently[C]. In: Proc. of the International Conference on Machine Learning(ICML 99), 1999

共引文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部