期刊文献+

基于改进决策树算法的网络关键资源页面判定 被引量:11

Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm
下载PDF
导出
摘要 关键资源页面是网络信息环境中一种重要的高质量页面,是用户进行网络信息检索的主要目标.决策树算法是机器学习中应用最广的归纳推理算法之一,适用于关键资源页面的判定.然而由于Web数据均一采样的困难性,算法缺乏有足够代表性的反例进行训练.为了解决这个问题,提出一种利用训练样例的统计信息而非个体信息进行学习的改进决策树算法,并利用这种算法实现了独立用户查询的关键资源页面判定.在2003年文本信息检索会议(TextRetrievalConference,简称TREC)标准的评测条件下,基于此种改进决策树算法的大规模网络信息检索实验获得了超过基本算法40%的性能提高.这不仅提供了一种查找Web关键资源页面的有效方式,也给出了提高决策树算法性能的一个可行途径. Key resource page is one of the most important search target pages for Web search users. Decision tree learning is one of the most widely-used and practical methods for inductive inference in machine learning. Because of the difficulty in uniform sampling of Web pages, there are not enough negative instances for training a key resource decision tree. To solve the problem, the original algorithm is partly modified to learn from global instead of individual instance information. With the same evaluation method as TREC (Text Retrieval Conference) 2003, large scale retrieval experiments based on improved decision tree algorithm achieves more than 40% improvement than the ones based on the original algorithm. It not only offers an effective way for selecting Web key resource pages, but also shows a possible way to imorove decision tree learning performances.
出处 《软件学报》 EI CSCD 北大核心 2005年第11期1958-1966,共9页 Journal of Software
基金 国家自然科学基金 国家重点基础研究发展规划(973) 国家教育部科学技术研究重大项目资助~~
关键词 网络信息检索 关键资源页面 机器学习 决策树 Web information retrieval key resource page machine learning decision tree
  • 相关文献

参考文献16

  • 1Amento B, Terveen L, Hill W. Does authority mean quality? Predicting expert quality ratings of Web documents. In: Belkin NJ,Ingwersen P, Leong MK, eds. SIGIR 2000: Proc. of the 23rd Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval 2000. New York: ACM Press, 2000. 296-303.
  • 2Davison BD. Topical locality in the Web. In: Belkin NJ, Ingwersen P, Leong MK, eds. SIGIR 2000: Proc. of the 23rd Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval 2000. New York: ACM Press, 2000. 272-279.
  • 3Bharat K, Henzinger M. Improved algorithms for topic distillation in a hyperlinked environment. In: Croft BW, Moffat A, van Rijsbergen CJ, Wilkinson R, Zobel J, eds. SIGIR'98: Proc. of the 21st Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM Press, 1998. 104-111.
  • 4Broder A. A taxonomy of Web search. SIGIR Forum, 2002,36(2):1-8.
  • 5Henzinger MR, Motwani R, Silverstein C. Challenges in Web search engines. In: Gottlob G, Walsh T, eds. IJCAI 2003, Proc. of the 18th Int'l Joint Conf. on Artificial Intelligence. San Francisco: Morgan Kanfmann Publishers, 2003. 1573-1579.
  • 6Kleinberg JM. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999,46(5):604-632.
  • 7Chakrabarti S, Dom B, Kumar R, Raghavan P, Rajagopalan S, Tomkins A. Experiments in topic distillation. In: Brown E, Smeaton A, eds. Proc. of the ACM SIGIR Workshop on Hypertext Information Retrieval. New York: ACM Press, 1998. 13-21.
  • 8Chakrabarti S, Joshi M, Tawde V, Bombay IIT. Enhanced topic distillation using text, markup, tags and hyperlinks. In: Croft BW,Harper D J, Kraft DH, Zobel J, eds. SIGIR 2001: Proc. of the 24th Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM Press, 2001. 208-216.
  • 9Mitchell TM. Machine Learning. New York: McGraw-Hill, 1997. 55-64.
  • 10刘小虎,李生.决策树的优化算法[J].软件学报,1998,9(10):797-800. 被引量:130

二级参考文献4

  • 1洪家荣,计算机学报,1991年,6卷
  • 2洪家荣,Int J Computer Inf Sci,1985年,14卷,6期,421页
  • 3Tu Peilei,Proceedings of the 1992 IEEE International Conference on Tools for Artificial Intelligence,1992年
  • 4Hong J R,Internat J Comput Infor-mation Sci,1985年,14卷,6期,421页

共引文献184

同被引文献76

引证文献11

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部