基于规则集的Deep Web信息检索

Rules-based Deep Web Information Retrieval

下载PDF

导出

摘要提出一种基于规则集的新型Deep Web信息检索模型。该模型包含4个层次,主要处理环节如任务分派、信息提取、数据清洗等引入了Deep Web特有的结构规则、逻辑规则和应用规则协助工作。把该模型应用于科技文献检索、电子机票定购和工作简历搜索3个领域,实验结果证明该模型灵活、可信,有效信息查全率达到96%以上。 This paper proposes a novel rules-based model to extract data from Deep Web pages. The model comprises four layers, main processing parts as task allocation, information extraction, data cleaning which work based on the rules of structure, logic and application. It applies the new model to three intelligent system, scientific paper retrieval, electronic ticket ordering and resume searching. Experimental results show that the proposed method is robust and feasible.

作者杨巨峰史广顺赵玉娟王庆人

机构地区南开大学机器智能研究所

出处《计算机工程》 CAS CSCD 北大核心 2008年第13期51-53,共3页 Computer Engineering

基金天津市自然科学基金资助项目(05YFJMJC01500)

关键词信息检索深层网络规则集数据提取 information retrieval Deep Web rules set data extraction

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1Chang K C C, He Bin, Li Changkai, et al. Structured Databases on the Web: Observations and Implications[J]. SIGMOD Record, 2004, 33(3): 61-70.
2Cope J, Craswell N, Hawking D. Automated Discovery of Search Interfaces on the Web[C]//Proceedings of the 14th Australasian Database Conference. Adelaide, Australia: [s. n.], 2003:181-189.
3Arasu A, Garcia-Molina H. Extracting Structured Data from Web Pages[C]//Proc. of the 22nd International Conf. on Management of Data. San Diego, California, USA: [s. n.], 2003: 337-348.
4Liu Ling, Pu Calton, Han Wei. XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources[C]//Proceedings of the 16th International Conference on Data Engineering. San Diego, CA, USA: [s. n.], 2000: 611-621.
5Crescenzi V, Mecca G; Merialdo E Road Runner: Towards Automatic Data Extraction from Large Web Sites[C]//Proceedings of the 27th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001: 109-118.
6Song Hui, Gift S, Ma Fanyuan. Data Extraction and Annotation for Dynamic Web Pages[C]//Proceedings of the 2004 IEEE International Conference on E-technology, E-commerce and E-service. Taipei, Taiwan, China: [s. n.], 2004: 499-502.

1彭媛媛.工作流技术及其在高校办公自动化系统的应用研究[J].办公室业务,2014(1S):143-144. 被引量：8
2读者之声[J].中国计算机用户,2006(11):3-3.
3安全专家惊曝电子机票三大隐患市民需高度警惕[J].数码世界,2006,0(12):33-33.
4罗勇.忙而不乱,让工作有条不紊[J].电脑爱好者,2010(23):42-42.
5乔峰,张慧欣.一种基于FPGA硬件求解函数的简化方法[J].电子科学技术,2015,2(5):532-537. 被引量：2
6冯占省.英文科技文献检索的三大基础[J].中国高校科技,2011(8):40-41.
7李光伟.智能视频监控技术与应用[J].城市建设（下旬）,2010(5):284-284. 被引量：2
8刘任平,侯瑞真,方英兰,韩先锋.拟人机器人语音识别系统的硬件设计[J].计算机时代,2013(1):1-2. 被引量：2
9李雪,冉祥来,方丁,樊重俊.RFID在机场航空管理中的应用[J].中国制造业信息化（学术版）,2012,41(11):98-101.
10颜正恕,周建平.具有隐私保护的新型智能电子机票研究[J].电子与电脑,2006(12):121-124.

计算机工程

2008年第13期

浏览历史

内容加载中请稍等...

基于规则集的Deep Web信息检索

参考文献6

相关作者

相关机构

相关主题

浏览历史