摘要
互联网有着浩瀚的信息,如何高效、准确获取想要的信息是一个重要的问题,本文将信息获取技术分两个部分来进行,即资源发现模块和信息抽取模块,并基于此构建了一个信息自动获取平台。对于资源发现模块,主要在如何能够从广度和深度两个方面去发现资源提出了一种新的搜索算法,同时利用了多Agent技术实现了分布式的资源发现。对于信息抽取模块,提出了一种新的抽取规则表示方法,提高了在信息抽取过程中规则的适应性。
Internet has a vast information, how to obtain the desired information efficiently and accurately is an important issue. Information acquisition has two tasks: resource discovery and information extraction. And an information automatic acquisition platform was implemented based on the above mentioned tasks. In the resource discovery, a novel search algorithm was proposed to find data resources in surface web and deep web. Multi-agent was exploited to discover distributed web resources. In terms of information extraction, a new expression method of extraction rules was proposed to improve the adaptability of the web data acquisition.
出处
《农业网络信息》
2009年第8期42-45,共4页
Agriculture Network Information
基金
国家"十一五"科技支撑计划项目课题(2006BAD10A05)
关键词
信息获取
信息抽取
多AGENT
资源发现
抽取规则
information acquisition
information extraction
multi-agent
resource discovery
extraction rules