摘要
提出一种基于规则集的新型Deep Web信息检索模型。该模型包含4个层次,主要处理环节如任务分派、信息提取、数据清洗等引入了Deep Web特有的结构规则、逻辑规则和应用规则协助工作。把该模型应用于科技文献检索、电子机票定购和工作简历搜索3个领域,实验结果证明该模型灵活、可信,有效信息查全率达到96%以上。
This paper proposes a novel rules-based model to extract data from Deep Web pages. The model comprises four layers, main processing parts as task allocation, information extraction, data cleaning which work based on the rules of structure, logic and application. It applies the new model to three intelligent system, scientific paper retrieval, electronic ticket ordering and resume searching. Experimental results show that the proposed method is robust and feasible.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第13期51-53,共3页
Computer Engineering
基金
天津市自然科学基金资助项目(05YFJMJC01500)
关键词
信息检索
深层网络
规则集
数据提取
information retrieval
Deep Web
rules set
data extraction