摘要
本文提出了两种细粒度的、基于网页结构挖掘的信息提取方法,比较了它们的优缺点,并给出了相应具体实现的性能测试和结果分析。
To simplify the task of obtaining information from the vast number of information sources that are available on the WWW, we have developed two different methods to extract information of fine grain. This paper firstly describes the principles of the two methods, which work by mining structures of Web pages, and then compares the advantages and disadvantages of them. Finally, we test the performance of the two methods and analyze the experiment results.
出处
《计算机科学》
CSCD
北大核心
2006年第3期191-193,218,共4页
Computer Science
关键词
信息提取
网页结构挖掘
重复模式
时间特征
RSS
Information extraction, Mining structures of Web pages, Repeated pattern, Time characteristic, RSS