摘要
针对Deep Web的查询需求,提出了一种基于K-近邻算法的Deep Web数据源的自动分类方法。该算法在对Deep Web网页进行表单特征提取及规范化的基础上,基于距离对Deep Web网页所属的目标主题进行判定。实验结果表明:基于K-近邻分类算法可以较有效地进行DeepWeb数据源的自动分类,并得到较高的查全率和查准率。
To meet the need of Deep Web query,an algorithm for classification of Deep Web sources based on KNN is put forward.The algorithm extracts the form features from Web pages,and makes the form features vector normal.Then the algorithm classifies Deep Web pages by computing distance.The experimental results show that the algorithm has improved in precision and recall.
出处
《信息技术》
2011年第5期108-111,共4页
Information Technology