期刊文献+

基于XML和DOM技术的Web信息抽取模型 被引量:1

Research on Web Information Extraction Model Based on XML and DOM Technologies
下载PDF
导出
摘要 将XML技术应用于搜索引擎,提出一种基于XML和DOM技术的Web信息抽取模型,对模型的数据采集、页面优化处理、抽取规则生成和信息抽取四个阶段进行了详细分析,讨论了网页爬虫、NekoHTML、Xerces-J、JTree、Xpath以及XSLT技术在Web信息抽取中的应用,实现了Web信息抽取的半自动化. XML technology is applied in search engine, and a web information extraction model based on XML and DOM technology is proposed. The stages of data acquisition, web age optimization, extraction rule genera- tion and information extraction are analyzed in detail. The technologies of webpage reptile, NekoHTML, Xerc- es-J, JTree, Xpath and XSLT are applied in Web information extraction. Finally, semi-automation method of Web information extraction is realized.
出处 《大连交通大学学报》 CAS 2013年第3期96-99,118,共5页 Journal of Dalian Jiaotong University
基金 武汉大学软件工程国家重点实验室开放基金资助项目(SKLSE2012-9-27) 四川省重点实验基金资助项目(GK201202) 广西混杂计算与集成电路设计分析重点实验室基金资助项目
关键词 信息抽取 XML技术 DOM技术 WEB页面 information extraction XML technology DOM technology Web page
  • 相关文献

参考文献13

  • 1陈佳,胡燕,轩艳艳.一种基于XML的Web信息抽取方法[J].计算机与数字工程,2007,35(6):101-103. 被引量:3
  • 2冀高峰,汤庸,道炜,吴桂宾,黄帆,王鹏.基于XML的自动学习Web信息抽取[J].计算机科学,2008,35(3):87-90. 被引量:10
  • 3JOHNSON E J,KUNZE A R.IXP2400/2800 program-ming-the complete micro engine coding guide[M].[s.l.]:Intel Press,2003.
  • 4DAVID W E,YUANJ,DERMIS Y K NG.Record-Bound-ary Discovery in Web Documents.Proc of ACM SIGMODInternational Conference on Management of Data[C].USA:Pennsylvania,1999:467-478.
  • 5CHRISTINA Y C,MICHAEL G,NEEL S.Reverse engi-neering for web data:From visual to semantic structures:Proc of the 18th International Conference on data Engi-neering[C].California:San Jose,2002:53-63.
  • 6ROBERT BAUMGARTNER,SERGIO FIESCA,GEORGGOTTLOB.Supervised wrapper generation with lixto:Proceedings of 27th international Conference on VeryLarge DatabaseRomaItaly[C].[s.1.]:[s.n.],2001:1-2.
  • 7LLUL PU C,HAN W.XWRAP:P:An XML-enabledwrapper construction system for Web Informationsources:Proceedings of the International Conference onData Engineering[C].[s.l.]:SanDiego,2000:611-621.
  • 8王琦,唐世渭,杨冬青,王腾蛟.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792. 被引量:81
  • 9黄豫清,戚广志,张福炎.从WEB文档中构造半结构化信息的抽取器[J].软件学报,2000,11(1):73-78. 被引量:47
  • 10CHANG C H,KAYEDM,GIRGIS M R,et al.A surveyof Web information extraction systems[J].IEEE Trans-actions on Knowledge and Data Engineering,2006,18(10):1411-1428.

二级参考文献34

  • 1王琦,唐世渭,杨冬青,王腾蛟.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792. 被引量:81
  • 2O Buyukkokten, H Garcia-Molina, A Paepcke. Accordion summarization for end-game browsing on PDAs and cellular phones. In: Proc of ACM Conf on Human Factors in Computing Systems(CHI 2001). New York: ACM Press, 2001. 213~220
  • 3Wang Tengjiao, Tang Shiwei, Yang Dongqing, et al. COMIIX:Towards effective WEB information extraction, integration and query answering. In: Proc of SIGMOD' 02. New York: ACM Press, 2002. 620
  • 4Liu Ling, Pu Calton, Han Wei. XWRAP: An XML-enabled wrapper construction system for Web information sources. In:Proc of the 16th Int'l Conf on Data Engineering. Washington:IEEE Computer Society Press, 2000. 611~621
  • 5R Baumgartner, S Flesca, G Gottlob. Visual Web information extraction with Lixto. In: Proc of the 27th Int'l Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 2001. 119~ 128
  • 6D Freitag. Machine learning for information extraction in information domains. Machine Learning, 2000, 39 (2-3): 169 ~202
  • 7S SoderLan. Learning information extraction rules for semistructured and free text. Machine Learning, 1999, 34(1-3): 233~ 272
  • 8R D Doorenbos, O Etzioni, D S Weld. A scalable comparasonshopping agent for the World-Wide Web. In: ACM Agents' 97.New York: ACM Press, 1997. 39~48
  • 9D W Embley, et al. Conceptual-model-based data extraction from multiple-record Web pages. Data and Knowledge Engineering,1999, 31(3): 227~251
  • 10A Finn, A Kushmerick, B Smyth. Fact or fiction: Content classification for digital libraries. The 2nd DELOS Network of Excellence Workshop on Personalisation and Recommender Systems in Digital Libraries, Dublin, Ireland, 2001

共引文献141

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部