期刊文献+

基于熵的新闻网页抽取方法的研究 被引量:2

An Entropy-Based Approach for News Article Extraction from Web Page
下载PDF
导出
摘要 为了减少或根除新闻网站中大量非主题信息的干扰,提出一种新闻网页抽取方法,采用基于熵的计算和DOM树的知识,从新闻网页中抽取主题文档和相关链接。 In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.
出处 《现代图书情报技术》 CSSCI 北大核心 2007年第4期48-51,共4页 New Technology of Library and Information Service
关键词 信息抽取 信息块 DOM Entropy Information extraction Informative block DOM
  • 相关文献

参考文献3

二级参考文献16

  • 1[1]Line Eikvil, Information Extraction from World Wide Web- A Survey[M], Report No. 945, Norwegian Computing Center, ISBN 82-539-0429-0, July, 1999.
  • 2[2]Chia-Hui Chang, Shao-Chen Lui , IEPAD: Information Extraction Based on Pattern Discovery [C], Proceedings of the Tenth International World Wide Web Conference, Hong Kong , May 2001. http:// www10.org/ cdrom/ papers/223/.
  • 3[3]Embley D.W., Jiang Y.S., Ng Y.K., Record-Boundary Discovery in Web Documents[C], Proceedings of SIGMOD, Philadelphia, USA, 1999.
  • 4[4]Morrison, D.R. Journal of ACM [J], 15:514-534.
  • 5[5]E. Ukkonen. On-line construction of suffix-tree[J], algorithmica,14:249-60,1995.
  • 6[1]R Botafogo, E Rivlin, B Shneiderman. Structural analysis of hypertext: Identifying hierarchies and useful metrics. ACM Trans on Information System, 1992, 10(2): 142~180
  • 7[2]J Carriere, R Kazman. WebQuery: Searching and visualizing the Web through connectivity. The 6th Int'l WWW Conf (WWW6), Santa Clara, 1997
  • 8[3]Jon M Kleinberg. Authoritative sources in a hyperlinked environment. The 9th Annual ACM-SIAM Symp on Discrete Algorithms, California, 1997
  • 9[4]K Bharat, M R Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. The 21st Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR 98), Melbourne, 1998
  • 10[5]S Brin, L Page. The anatomy of a large-scale hypertextual web search engine. The 7th Int'l WWW Conf (WWW7), Brisbane, Australia, 1998

共引文献30

同被引文献22

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部