摘要
提出了一种基于文档对象模型(DOM)和网页显示属性的信息除噪方法.通过对网页内容进行结构和特点分析,把一个网页信息内容划分为信息块和噪声块两个部分,利用解析器把网页转化成DOM模型并对网页信息噪声进行判断,根据网页的显示属性对DOM模型进行简化,最终实现对DOM模型噪声信息的有效去除.
This paper introduces a method of noise elimination that based on the DOM and web vision attribute.By analysing the structure and characters of web information,web page could be separated into two categories: Valuable segments and Noise segments.This paper gets DOM with Parser,identifies web page noise,reduces DOM based on vision attribute and then eliminates web page noise.
出处
《商丘师范学院学报》
CAS
2010年第9期90-93,共4页
Journal of Shangqiu Normal University