摘要
XML适合于解决Web数据挖掘中数据库环境异构和信息的半结构化等难题。Web结构挖掘是整个Web信息挖掘的重要组成部分。用XML来完成Web结构信息的预处理是将Web结构信息规范化并转化为XML数据,并以此明确网站的文件构成、组织方式、内容构成和内容的超链关系。提出基于XML的Web结构挖掘系统的实现过程,解决了XML文件通过标准接口读入到挖掘程序的关键技术。
The XML is suitable for solve the difficult problem ofthe different database environment and the half-structured ofinformation. The structure mining of web is an important part in web information digest. The pretreatment of web structure with XML is standardizing the information of web structure and transact it to XML format, and then there is an explicit cognition about the document structure, the mode of organization, the contents constitute and the hyperlink relation of contents. The realization procedure of web structure mining is described based on XML and the key technique problem about how themining soft read an XML file with a standard interface is solved.
出处
《计算机工程与设计》
CSCD
北大核心
2006年第23期4447-4449,4460,共4页
Computer Engineering and Design
基金
重庆师范大学校级科研基金项目(05XWY070)