摘要
W eb页面中除了所包含的数据外,往往还包含很多导航信息、广告等。针对W eb页面的特点,提出了DOM树比较算法,通过对多个页面进行比较,识别出主体内容。实验结果证明该方法是有效可行的。
Besides the needed data, there are lots of navigation information and advertisements in the Web pages. A DOM tree comparison algorithm was proposed. It compared several pages within a class, and recognized the main contents in pages. Experiment results show that it is feasible and effective.
出处
《计算机应用》
CSCD
北大核心
2005年第11期2612-2614,共3页
journal of Computer Applications