摘要
基于网页结构特征的信息抽取是现阶段最为常用的抽取手段。为了将待抽取的信息从DOM树中准确地抽取出来,需要对欲抽取的信息进行准确地定位。本文提出了一种新的定位方法,在HTML DOM树的基础上,借助于CSS选择器,抽取所需要的信息。
Information extraction Webpage based on structural features is the most commonly used means for the present stage extraction. In order to be extracting accurate information from the DOM tree extract, the need for accurate positioning for information extraction. This paper presents a new method of positioning, based on HTML DOM tree, with the help of CSS selector, extracting the required information.
出处
《信息技术与信息化》
2015年第3期100-102,共3页
Information Technology and Informatization