期刊文献+

基于XML的Web数据挖掘关键技术的研究 被引量:10

Research on Key Technologies of Web Mining Based on XML
下载PDF
导出
摘要 由于存在着大量的在线信息,WWW成为数据挖掘的热点。该文介绍了Web网页的数据挖掘技术,提出一种基于XML的Web数据挖掘模型,阐述将半结构化HTML文档转换成良构的XML文档的原因,并给出基于HTMLTide库的转换代码,介绍了利用XML技术从Web网页析取数据的关键技术,包括XHTML、XSLT和XQuery等,对Web数据挖掘的其他方面如数据检验和集成作了一定的探讨。 With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. This paper addresses the issues related to data extraction from Web pages, and strongly suggests an XML-based approach for solving it. This paper describes the motivations behind converting semi-structured HTML documents into well-formed XML and presents a portion of conversion source codes that is developed based on HTML Tidy library, illustrates how to extract desired information from Web pages with XML technologies, including XHTML, XSLT and XQuery. It also discusses other aspects in the Web mining project such as data check and data integration.
出处 《计算机工程》 EI CAS CSCD 北大核心 2006年第20期43-44,77,共3页 Computer Engineering
基金 软件工程国家重点实验室开放基金资助项目
关键词 WEB数据挖掘 XML模型 关键技术 Web data mining XML-based model Key technologies
  • 相关文献

参考文献4

  • 1Kosala R,Blockeel H.Web Mining Research:A Survey[J].ACM SIGKDD,2000,(7):311-321.
  • 2Myllymaki J.Effective Web Data Extraction with Standard XML Technologies[C].Proceedings of the 10th International Conference on World Wide Web.New York:ACM Press,2001:689-696.
  • 3Liu Ling,Pu Calton,Han Wei.XWRAP:An XML-enabled Wrapper Construction System for Web Information Sources[C].Proc.of International Conference on Data Engineering,San Diego,California,2000-02.
  • 4Han Jiawei,Chang Kevin.Data Mining for Web Intelligence,Volume 35[M].Los Alamitos:IEEE Computer Society Press,2002:64-70.

同被引文献44

引证文献10

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部