基于查询接口文本VSM的Deep Web数据源分类被引量：2

DEEP WEB DATA SOURCES CLASSIFICATION BASED ON TEXT VSM OF QUERY INTERFACE

下载PDF

导出

摘要随着Internet技术的快速发展,Web数据库数目庞大而且仍在快速增长。为有效组织利用深藏于Web数据库上的信息,需对其按领域进行分类和集成。Web页面上的查询接口是网络用户访问Web数据库的唯一途径,对Deep Web数据源分类可通过对查询接口分类实现。为此,提出一种基于查询接口文本VSM(Vector Space Model)的分类方法。首先,使用查询接口文本信息构建向量空间模型,然后通过典型的数据挖掘分类算法训练分类器,从而实现对查询接口所属领域进行分类。实验结果表明给出的方法具有良好的分类性能。 With the rapid development of Internet technology, a large number of Web databases have mushroomed and the number remains in a fast-growing trend. In order to effectively organise and utilise the information which hides deeply in Web databases, it is necessary to classify and integrate them according to domains. Since the query interface of Webpage is the unique channel to access the Web database, the classification of Deep Web data source can be realised by classifying the query interfaces. In this paper, a classification method based on text VSM of query interface is proposed. The basic idea is to build a vector space model （VSM） by using query interface text information firstly. Then the typical data mining classification algorithm is employed to train one or more classifiers, thus to classify the domains the query interfaces belonging to is implemented. Experimental result shows that the approach proposed in the paper has excellent classification performance.

作者石龙强保华谌超吴春明

机构地区桂林电子科技大学计算机科学与工程学院西南大学计算机与信息科学学院

出处《计算机应用与软件》 CSCD 北大核心 2013年第8期54-58,共5页 Computer Applications and Software

基金国家自然科学基金项目(61163057) 广西自然科学基金项目(2012jjAAG0063)

关键词 DEEP WEB 数据源分类向量空间模型数据挖掘查询接口 Deep Web Date source classification Vector space model Data mining Query interface

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量：136
2Chang K C,He B,Li C,et al.Structured databases on the web:Obser-vations and Implications[C].SIGMOD Record,33,3:61-70.
3Jay ant M,Jeffery S R,Cohen S,et al.Web-scale Data Integration:YouCall Only Afford to Pay as You Go[C]//Proceedings of the 3rd Bien-nial Conference on Innovative Data Systems Research.Asilomar,USA:[s.n.],2007:342-350.
4Barbosa L,Freire J,Silva A.Organizing hidden-Web databases by clus-tering visible Web documents [C]//Doqac A.Proc.of IEEE the 23rdInt ’ l Conf.on Data Engineering.Istanbul:IEEE Computer Society,2007:326-335.
5He B,Tao T,Chang KCC.Organizing structured Web sources by queryschemas:A clustering approach [C] //Gravano L.Proc.of ACM the13th Conf.on Information and Knowlege Management.Washington:ACM Press,2004:22-31.
6Ipeirotis P G,Gravano L,Sahami M.Probe,count,and classify:catego-rizing hidden Web databases[C] //Proceedings of the 19th ACM SIG-MOD International Conference on Management of Data,Santa Barbara,2001:67-78.
7Meng W,Wang W,Sun H,et al.Concept hierarchy based text databasecategorization [J].Knowl.Inf.Syst.,2002,4(2); 132-150.
8He B,Tao T,Chang K C.Clustering structured Web sources:a schema-based,model-differentiation Approach [C] //Proceedings of the 9 th In-ternational Conference on Extending Database Technology,Heraklion,Crete,2004:536-546.
9徐和祥,王述云,胡运发.基于本体的Deep Web查询接口分类[J].小型微型计算机系统,2008,29(10):1889-1892. 被引量：3
10郭东伟,李三义,张仲明,刘淼.基于模型匹配的Deep Web数据库分类[J].吉林大学学报（理学版）,2011,49(3):487-492. 被引量：1

二级参考文献88

1杨立,左春,王裕国.基于语义距离的K-最近邻分类方法[J].软件学报,2005,16(12):2054-2062. 被引量：31
2赵朋朋,高岭,崔志明.基于查询接口特征的Deep Web数据源自动分类[J].微电子学与计算机,2006,23(10):47-50. 被引量：11
3Bergman M K.The Deep Web:Surfacing Hidden Value[J].Journal of Electronic Publishing,2001,7(1):8912-8914.
4Chang K C C,HE Bin,LI Cheng-kai,et al.Structured Databases on the Web:Observations and Implications[J].Sigmod Record,2004,33(3):61-70.
5Cope J,Craswell N,Hawking D.Automated Discovery of Search Interface on the Web[C]//Proceedings of the 14th Australasian Database Conference.Adelaide:[s.n.],2003:181-189.
6Gupta S,Kaiser G,Neistadt D,et al.DOM-Based Content Extraction of HTML Documents[C]//Proceedings of the 12th International Conference on World Wide Web.New York:ACM,2003:207-214.
7Salton G,Buckley B.Term Weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management,1998,24(5):513-523.
8Ted T,Patwardhan S,Michelizzi J.Wordnet:Similarity-Measuring the Relatedness of Concepts[C]//Proceedings of the Nineteenth National Conference on Artificial Intelligence.Cambridge:AAAI Press,2004:1024-1025.
9.[EB/OL].http://www.cogsci.Princeton.edu,.
10刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量：136

共引文献136

1魏勇刚,张国春,常勇,袁方.基于词性分析和领域知识的Deep Web语义标注[J].郑州大学学报（理学版）,2009,41(1):52-55. 被引量：7
2郑淑丽,韩江洪,程文娟,吴永忠.Deep Web查询接口自动识别方法[J].郑州大学学报（理学版）,2009,41(1):56-58. 被引量：1
3李颖,刘国华,佟冰,刘顺江.基于素数的多源模式匹配方法的研究[J].燕山大学学报,2009,33(2):141-145. 被引量：1
4李益民.一种基于关键词的大规模Deep Web信息检索系统[J].图书情报工作,2008,52(10):29-32.
5鲜学丰,方巍,赵朋朋,崔志明,胡鹏昱.一种Deep Web数据源质量评估模型[J].微电子学与计算机,2008,25(10):47-50. 被引量：6
6崔晓军,彭智勇,曾承.基于多标注源的Deep Web查询结果自动标注[J].计算机应用,2009,29(1):196-200. 被引量：3
7李益民,魏立新,解成俊.基于用户模式Deep Web检索系统的研究[J].计算机工程与设计,2009,30(3):767-769.
8马安香,张斌,高克宁,齐鹏,张引.基于结果模式的Deep Web数据抽取[J].计算机研究与发展,2009,46(2):280-288. 被引量：15
9李齐会.Deep Web查询接口的判定技术研究[J].计算机与数字工程,2009,37(3):131-134. 被引量：1
10高明,黄哲学.Deep Web研究现状与展望[J].集成技术,2012,1(3):47-54. 被引量：1

同被引文献10

1Bergman Michael K. The Deep Web: surfacing hiddenvalue [ EB/OL ]. http://www. brightplanet. com/2012/06/the-deep-web-surfacing-hidden-value/. [ 2014-6-18] .
2Dalai D,Panwar A.Deep Web query extraction algorithmfor information retrieval system [ J ] . Computer Scienceand Information Technologies, 2014,5 ( 5 ) : 6867-6870.
3Wu C M, Qiang B’Zou X C. Deep Web classificationbased on domain feature text[ J] .Advancements in com-puting technology,2011,3(6) :267-275.
4Das N N,Kumar E.Identification of query forms for re-trieving the information from Deep Web[ J] .Transactionson Machine Learning and Artificial Intelligence, 2014,2(6):53-61.
5Harrington P.机器学习实战[M].李锐,李鹏,曲亚东,等译.北京:人民邮电出版社,2013.
6Han J,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2001.
7姚旭,王晓丹,张玉玺,权文.特征选择方法综述[J].控制与决策,2012,27(2):161-166. 被引量：201
8吴春明,谢德体.基于领域特征文本的Deep Web分类研究[J].计算机科学,2012,39(4):177-180. 被引量：4
9郭建兵,崔志明,陈明,赵朋朋.一种基于范围型属性的Deep Web数据提取方法[J].计算机应用与软件,2013,30(2):54-57. 被引量：2
10周由,戴牡红.语义分析与TF-IDF方法相结合的新闻推荐技术[J].计算机科学,2013,40(11A):267-269. 被引量：11

引证文献2

1苟和平,景永霞,刘强.一种基于XML分析的Deep Web查询接口分类研究[J].南华大学学报（自然科学版）,2016,30(2):78-82.
2景永霞,苟和平,刘强.基于矩阵分解的Deep Web查询接口相似性研究[J].兰州文理学院学报（自然科学版）,2016,30(6):74-77.

1马如霞,孟小峰.基于数据源分类可信性的真值发现方法研究[J].计算机研究与发展,2015,52(9):1931-1940. 被引量：8
2牟晓伟,刘寒梅.基于KNN的Deep Web数据源分类研究[J].信息通信,2015,28(1):19-21. 被引量：1
3姚双良,鞠时光.Deep Web数据源分类模型研究[J].江苏科技大学学报（自然科学版）,2012,26(1):45-49.
4华慧,伏玉琛,周小科.基于查询接口文本的Deep Web数据源分类[J].计算机工程,2010,36(12):66-68. 被引量：1
5孙娟.数据挖掘分类算法研究与探讨[J].电脑知识与技术,2008,0(12Z):2339-2340. 被引量：1
6洪月华,徐霜,梁家荣.一种基于粗糙集和遗传神经网络的数据分类器模型(英文)[J].广西科学,2013,20(2):128-131. 被引量：2
7陈如云,符保龙.SPRINT算法的并行性研究与应用[J].商场现代化,2007(07Z):13-14.
8王刚,黄丽华,张成洪,夏洁.数据挖掘分类算法研究综述[J].科技导报,2006,24(12):73-76. 被引量：10
9邢开颜,李梅.数据挖掘分类算法在信号分类中的应用[J].软件,2016,37(6):1-6. 被引量：3
10蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量：21

计算机应用与软件

2013年第8期

浏览历史

内容加载中请稍等...

基于查询接口文本VSM的Deep Web数据源分类被引量：2

参考文献10

二级参考文献88

共引文献136

同被引文献10

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于查询接口文本VSM的Deep Web数据源分类 被引量：2

参考文献10

二级参考文献88

共引文献136

同被引文献10

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于查询接口文本VSM的Deep Web数据源分类被引量：2