摘要
如何从海量的WEB信息中提取感兴趣的内容,传统的基于关键字检索的信息提取方法,适用于较为复杂的信息环境。针对特定信息的提取,提出了一种利用DOM树及HTML标签实现大量的特定格式化信息的精确提取方法。实验结果表明,在提取特定WEB信息的应用中达到100%的精确提取率。
With the development of Internet, it will be a new hotspot how to extract the information of our need from web. The traditional methods based on key words are applied to the fields on complex information. This paper puts forward an artifical method-based system by using DOM and HTML. The results show that the accuracy is 100 percent when extracting specifically information.
出处
《西南科技大学学报》
CAS
2009年第2期49-52,共4页
Journal of Southwest University of Science and Technology
基金
国家863计划项目(2003AA116060)
关键词
信息提取
人工策略
DOM
Information extraction
Artifical method
DOM