摘要
随着Internet的发展,Web正在不断深入人们的生活。传统搜索引擎只能检索浅层网络(Surface Web),不能直接索引到深层网络(Deep Web)的资源。为了有效利用Deep Web资源,对Deep Web数据源发现并进行领域类别的划分,已成为一个非常迫切的问题。该模型首先抽取Deep Web页面查询接口的特征,构造了一个Deep Web页面过滤器,从而能够发现Deep Web的数据源,其次在对查询接口特征进行分析后,构建了一个基于KNN的分类器,并通过该分类器对新产生的Deep Web数据源进行领域分类。试验结果表明,这种模型的平均分类准确率达到86.9%,具有良好的分类效果。
With the development of Internet,Web is continuously used in our lives.Traditional search engines are only able to reach surface Web except for Deep Web sources.To make use of Deep Web source efficiently,it's urgent that Deep Web sources are found out and classified.This work was focus on Deep Web classification,and a novel classification model was proposed.Its processing including two steps: at first,the model employed features of query interfaces of Deep Web,to recognize whether the Web page was Deep Web,and then,the specific subject of the Deep Web were be identified in the second step by utilize KNN algorithm.The experiments show that the average correct classification rate is 86.9%,and the detailed results are listed in the end of this paper.
出处
《计算机技术与发展》
2010年第7期65-67,71,共4页
Computer Technology and Development
基金
贵州省自然科学基金项目(黔科合GY字[2008]3035)