基于K-近邻算法的Deep Web数据源的自动分类

Automatic classification of Deep Web sources based on KNN algorithm

下载PDF

导出

摘要针对Deep Web的查询需求,提出了一种基于K-近邻算法的Deep Web数据源的自动分类方法。该算法在对Deep Web网页进行表单特征提取及规范化的基础上,基于距离对Deep Web网页所属的目标主题进行判定。实验结果表明:基于K-近邻分类算法可以较有效地进行DeepWeb数据源的自动分类,并得到较高的查全率和查准率。 To meet the need of Deep Web query,an algorithm for classification of Deep Web sources based on KNN is put forward.The algorithm extracts the form features from Web pages,and makes the form features vector normal.Then the algorithm classifies Deep Web pages by computing distance.The experimental results show that the algorithm has improved in precision and recall.

作者张智顾韵华

机构地区南京信息工程大学计算机与软件学院

出处《信息技术》 2011年第5期108-111,共4页 Information Technology

关键词深网查询接口 K近邻算法网页分类 Deep Web query interface KNN classification

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献10

1Raghavan S , Garcia-Molina H. Crawling the Hidden Web [ C ]. Proceedings of the 27th International Conference on Very Large Data Bases. Roma: [ s. n. ] ,2001 : 129 - 138.
2He B, Patel M, Zhang Z, et al. Accessing the Deep Web:A Survey [ J ]. Communications of the ACM ( CACM ) ,2007,50 (5) : 94 - 101.
3Panagiotis G Ipeirotis, Luis Gravano, Mehran Sahami. Probe, count, and classify: categorizing hidden web databases[ C ]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001:67 -78.
4Yih-Ling Hedley, Muhammad Younas,Anne E James. The categorisation of hidden web databases through concept specificity and coverage[ C]//proceedings of the 2005 international workshop on web and mobile information Systems ,2005:371 -376.
5He B, Tao T, Chang K C C. Organizing structured web sources by query schemas : a clustering approach [ C ]//Proceedings of the 13 th Conference on Information and Knowledge Management, 2004: 22 -31.
6Peng Qian, Meng Weiyi, He Hal, et al. WISE-Cluster: Clustering search engines automatically[ C]//6th ACM lnternational Workshop on Web Information and Data Management, 2004 104 -111.
7Michael K Bergman. The Deep Web: surfacing hidden value[J]. journal of electronic publishing, 2002, 7 ( 1 ) :8912 - 8914.
8赵朋朋,高岭,崔志明.基于查询接口特征的Deep Web数据源自动分类[J].微电子学与计算机,2006,23(10):47-50. 被引量：11
9金灵芝,王小玲,朱守中.Deep Web数据源自动分类[J].微计算机信息,2009,25(12):227-228. 被引量：3
10Gravano L. Qprober: A System for Automatic Classification of Hidden Web Database[ J]. ACM Transaction on Information Systems, 2003,21(1) :1 -41.

二级参考文献11

1Bergman M K. The Deep Web:Surfacing Hidden Value J/OL . The Journal of Electronic Publishin g, 2001 , 7 (1)2001 . htt p:// www. press, umich, edu/jep/07 - 01/bergman.HTML.
2Chang K C, He B, Li C, Patel M, Zhang Z. Structured databases on the Web: Observations and Implications. SIG-MOD Record, 2004, 33(3): 61-70
3Peng Q, Meng W, He H, Yu C T. WISE-cluster: Cluste-ring e-commerce search engines automatically//Proceedingsof the 6th ACM International Workshop on Web Information and Data Management. Washington, 2004:104-111
4Ipeirotis P G, Gravano L, Sahami M. Probe, count, an classify: Categorizing hidden Web databases//Proceedings othe 19th ACM SIGMOD International Conference on Man-agement of Data. Santa Barbara, 2001:67-78
5Michael K Bergman.The deep web:surfacing hidden value[J].In journal of electronic publishing,2002,7 (1):8912～8914
6K C C Chang,B He,C Li,et al.Structured databases on the web:observations and implications[J].SIGMOD Record,2004,33(3):61～70
7Panagiotis G Ipeirotis,Luis Gravano,Mehran Sahami.Probe,count,and classify:categorizing hidden web databases[C].In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data,2001:67～78
8Yih-Ling Hedley,Muhammad Younas,Anne E James.The categorisation of hidden web databases through concept specificity and coverage[C].In proceedings of the 2005 international workshop on web and mobile information Systems,2005:371～376
9B He,T Tao,K C C Chang.Organizing structured web sources by query schemas:a clustering approach[C].In Proceedings of the 13th Conference on Information and Knowledge Management,2004:22～31
10Qian Peng,Weiyi Meng,Hai He,et al.WISE-Cluster:Clustering e-commerce search engines automatically[C].In 6th ACM International Workshop on Web Information and Data Management,2004:104～111

共引文献12

1乔爱丽.深网信息资源及其在图书馆信息服务中的应用[J].图书馆学研究（应用版）,2010(1):70-72. 被引量：1
2YUAN Fang ZHAO Yao ZHOU Xu.A Deep Web Query Interfaces Classification Method Based on RBF Neural Network[J].Wuhan University Journal of Natural Sciences,2007,12(5):825-829. 被引量：1
3李齐会.Deep Web查询接口的判定技术研究[J].计算机与数字工程,2009,37(3):131-134. 被引量：1
4赵志宏,黄蕾,刘峰,陈振宇.Deep Web搜索技术进展综述[J].山东大学学报（工学版）,2009,39(2):15-20. 被引量：5
5张亮,陆余良,刘金红.Deep Web入口探测与分类方法研究[J].计算机应用研究,2009,26(12):4697-4700. 被引量：2
6鲜学丰,赵朋朋,辛洁,方巍,崔志明.基于领域样本查询的Deep Web数据库分类[J].微电子学与计算机,2010,27(3):20-23. 被引量：1
7沈炜,蒙祖强.基于Web日志粒度化的深网数据库分类[J].微计算机信息,2010,26(15):161-162.
8郭东伟,李三义,张仲明,刘淼.基于模型匹配的Deep Web数据库分类[J].吉林大学学报（理学版）,2011,49(3):487-492. 被引量：1
9周晓庆,肖顺文,肖建琼,罗兴贤.一种基于改进的权值调整技术数据源分类算法研究[J].计算机应用研究,2012,29(3):916-918. 被引量：2
10姚双良,鞠时光.Deep Web数据源分类模型研究[J].江苏科技大学学报（自然科学版）,2012,26(1):45-49.

1伍建军,康耀红.文本分类中特征降维方式的研究[J].海南大学学报（自然科学版）,2007,25(1):62-66. 被引量：4
2段青玲,杨仁刚,朱杨.一种表单Ajax信息项提取方法[J].计算机工程,2011,37(3):44-46.
3石鑫鑫,胡学钢,林耀进.融合互近邻和可信度的K-近邻分类算法[J].合肥工业大学学报（自然科学版）,2014,37(9):1055-1058. 被引量：6
4林令娟,刘希玉.基于微粒群优化的快速K-近邻分类算法[J].山东科学,2009,22(1):13-16. 被引量：2
5许燕青.基于平均距离的K-近邻分类改进算法[J].电脑编程技巧与维护,2010(24):41-42.
6肖红,刘淑华.一种文本多级分类方法研究[J].长江大学学报（自科版）（上旬）,2008,5(2):92-95.
7王建伟,张璞.K-近邻分类算法的研究及实现[J].黑龙江科技信息,2009(17):45-45. 被引量：1
8乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量：25
9周靖.基于C#最近邻算法的教学系统分析与设计[J].实验科学与技术,2016,14(1):98-101.
10张玲珠,周忠眉.结合属性值贡献度与平均相似度的KNN改进算法[J].计算机工程与应用,2010,46(18):130-131. 被引量：1

信息技术

2011年第5期

浏览历史

内容加载中请稍等...

基于K-近邻算法的Deep Web数据源的自动分类

参考文献10

二级参考文献11

共引文献12

相关作者

相关机构

相关主题

浏览历史