期刊文献+

基于Web的新闻信息抽取 被引量:11

News Information Extraction for Web Resource
下载PDF
导出
摘要 随着互联网的普及,信息技术的发展,形成了大量的新闻信息资源。从海量的新闻信息中抽取出有用的资源,是当前迫切需要解决的问题。该文在分析新闻网页结构的基础上,结合了基于DOM的结构抽取和基于文本特征模式抽取两种处理技术的优点,提出了基于Web新闻网页的半自动化抽取技术,自动下载了有用的Web页面,抽取了所需的新闻信息。最后,该文描述了一个面向奥运新闻的信息抽取系统,并给出了该系统的实验结果。 With the widespread use of Internet and the development of information technology, there are a tremendous amount of news information resource. The ability to quickly obtain useful resource from the huge news information is a crucial problem at present, Based on the analysis of news information, this paper introduces an approach of semi automatically extracting from Web resource. Moreover, it gives the system which extracts useful Olympic news information and experiment results of it.
出处 《计算机工程》 CAS CSCD 北大核心 2006年第10期74-76,共3页 Computer Engineering
基金 国家"863"计划基金资助项目(2002AA117010-10)
关键词 信息抽取 包装器 DOM 抽取规则 Information extraction Wrapper DOM Extraction rule
  • 相关文献

参考文献5

  • 1Muslea I.Extraction Patterns for Information Extraction Tasks:A Survey[C].AAAI-99 Workshop on Machine Learning for Information Extraction,1999.
  • 2李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:101
  • 3Eikvil L.Information Extraction from World Wide Web-A Survey[R].Norwegian Computer Center,Tech.Rep:945,1999-07.
  • 4World Wide Web Consortium:The Document Object Model[EB/OL].http://www.w3.org/DOM,2004.
  • 5Chang Chiahui,Lui Shaochen.IEPAD:Information Extraction Based on Pattern Discovery[C].Proceedings of the Tenth International Conference on World Wide Web,Hong Kong,2001-05.

二级参考文献17

  • 1Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74
  • 2Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215
  • 3Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/
  • 4Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994
  • 5Mitchell T. Machine Learning. New York: McGraw Hill, 1997
  • 6Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000
  • 7Birbeck M et al. Professional XML. Wrox Press Inc, 2000
  • 8Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621
  • 9Chamberlin D, Robie J, Florescu D. Quilt: An XML query language for heterogeneous data sources. In: Proc International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, 2000. 53-62
  • 10Sahuguet A, Azavant F. Building light-weight wrappers for legacy web datasources using w4f. In: Proc International Conference on Very Large Databases, Edinburgh, Scotland, 1999. 738-741

共引文献100

同被引文献75

引证文献11

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部