期刊文献+

基于查询接口特征的Deep Web数据源自动分类 被引量:11

Automatic Classification of Deep Web Sources Based on Search Interface Schemas
下载PDF
导出
摘要 搜索引擎可以很好地搜索出大部分可索引页面,然而,Internet上有大量的页面是由后台数据库动态产生的,传统的搜索引擎搜索不出这部分页面,我们称之为DeepWeb。其中大部分DeepWeb是结构化的,它提供结构化的查询接口和结构化的结果。把这些结构化的DeepWeb数据源按所属领域进行组织可以方便用户浏览这些有价值的资源,并且这也是大规模DeepWeb集成搜索的一个关键步骤。提出了一种基于查询接口特征的DeepWeb数据源自动分类方法,并通过实验验证该方法是非常有效的。 Web search engines work well for finding crawlable pages, but not for finding datasets hidden behind Web search forms. On this deep Web, many sources are structured by providing structured query interfaces and results. Organizing such structured sources into a domain hierarchy that users can browse to find these valuable resources and is one of the critical steps toward the large-scale integration of heterogeneous Deep Web sources. We propose a Automatic Classification of Structured Deep Web Sources based on the features available on the search interfaces. Our experimental results indicate that this approach can achieve good results.
出处 《微电子学与计算机》 CSCD 北大核心 2006年第10期47-50,共4页 Microelectronics & Computer
基金 2005年度教育部科研重点项目(205059) 教育部"高校博士学科点科研基金项目"(20040285016) 江苏省高技术研究计划项目(BG2005019)
关键词 DEEP WEB 自动分类 机器学习 数据集成 Deep Web, Automatic classification, Machine learning, Data integration
  • 相关文献

参考文献6

  • 1Michael K Bergman.The deep web:surfacing hidden value[J].In journal of electronic publishing,2002,7 (1):8912~8914
  • 2K C C Chang,B He,C Li,et al.Structured databases on the web:observations and implications[J].SIGMOD Record,2004,33(3):61~70
  • 3Panagiotis G Ipeirotis,Luis Gravano,Mehran Sahami.Probe,count,and classify:categorizing hidden web databases[C].In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data,2001:67~78
  • 4Yih-Ling Hedley,Muhammad Younas,Anne E James.The categorisation of hidden web databases through concept specificity and coverage[C].In proceedings of the 2005 international workshop on web and mobile information Systems,2005:371~376
  • 5B He,T Tao,K C C Chang.Organizing structured web sources by query schemas:a clustering approach[C].In Proceedings of the 13th Conference on Information and Knowledge Management,2004:22~31
  • 6Qian Peng,Weiyi Meng,Hai He,et al.WISE-Cluster:Clustering e-commerce search engines automatically[C].In 6th ACM International Workshop on Web Information and Data Management,2004:104~111

同被引文献106

引证文献11

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部