期刊文献+

Deep Web数据源发现与分类模型 被引量:2

Discovery and Classification Model for Deep Web Sources
下载PDF
导出
摘要 随着Internet的发展,Web正在不断深入人们的生活。传统搜索引擎只能检索浅层网络(Surface Web),不能直接索引到深层网络(Deep Web)的资源。为了有效利用Deep Web资源,对Deep Web数据源发现并进行领域类别的划分,已成为一个非常迫切的问题。该模型首先抽取Deep Web页面查询接口的特征,构造了一个Deep Web页面过滤器,从而能够发现Deep Web的数据源,其次在对查询接口特征进行分析后,构建了一个基于KNN的分类器,并通过该分类器对新产生的Deep Web数据源进行领域分类。试验结果表明,这种模型的平均分类准确率达到86.9%,具有良好的分类效果。 With the development of Internet,Web is continuously used in our lives.Traditional search engines are only able to reach surface Web except for Deep Web sources.To make use of Deep Web source efficiently,it's urgent that Deep Web sources are found out and classified.This work was focus on Deep Web classification,and a novel classification model was proposed.Its processing including two steps: at first,the model employed features of query interfaces of Deep Web,to recognize whether the Web page was Deep Web,and then,the specific subject of the Deep Web were be identified in the second step by utilize KNN algorithm.The experiments show that the average correct classification rate is 86.9%,and the detailed results are listed in the end of this paper.
出处 《计算机技术与发展》 2010年第7期65-67,71,共4页 Computer Technology and Development
基金 贵州省自然科学基金项目(黔科合GY字[2008]3035)
关键词 深层网络 查询接口 K近邻算法 分类 Deep Web query interfaces KNN classification
  • 相关文献

参考文献9

二级参考文献71

共引文献66

同被引文献11

  • 1郑冬冬,崔志明.Deep Web查询接口选择[J].计算机应用,2006,26(9):2024-2027. 被引量:6
  • 2高岭,赵朋朋,崔志明.Deep Web查询接口的自动判定[J].计算机技术与发展,2007,17(5):148-151. 被引量:13
  • 3Chang K C, He B, Li C, et al. Structured databases on the Web : Observations and implications [ J ]. ACM SIG- MOD Record, 2004,35(3) :61-70.
  • 4He B, Zhang Z, Chang K C. Knocking the door to the Deep Web : Integrating Web query interfaces [ C ]// Pro- ceedings of the 2004 ACM SIGMOD International Confer- ence on Management of Data. 2004:913-914.
  • 5Cope J, Craswell N, Hawking D. Automated discovery of search interfaces on the Web[ C]//Proceedings of the lathAustralasian Database Conference. 2003:181-189.
  • 6He H, Meng W Y, Yu C, et al. WISE-Inte~ator: A sys- tem for extracting and integrating complex Web search in- terfaces of the Deep Web[ C]//Proceedings of the 31 st In- ternational Conference on Very Large Data Bases. 2005: 1314-1317.
  • 7Raghavan S, Gareia-Molina H. Crawling tile Hidden Web [ R]. Conjurer Science Department, Stanford University, 2000.
  • 8Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bring Order to the Web[ R]. Stanford Universi- ty, 1997.
  • 9刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 10王小林,王义.改进的基于知网的词语相似度算法[J].计算机应用,2011,31(11):3075-3077. 被引量:38

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部