期刊文献+

基于查询接口文本VSM的Deep Web数据源分类 被引量:2

DEEP WEB DATA SOURCES CLASSIFICATION BASED ON TEXT VSM OF QUERY INTERFACE
下载PDF
导出
摘要 随着Internet技术的快速发展,Web数据库数目庞大而且仍在快速增长。为有效组织利用深藏于Web数据库上的信息,需对其按领域进行分类和集成。Web页面上的查询接口是网络用户访问Web数据库的唯一途径,对Deep Web数据源分类可通过对查询接口分类实现。为此,提出一种基于查询接口文本VSM(Vector Space Model)的分类方法。首先,使用查询接口文本信息构建向量空间模型,然后通过典型的数据挖掘分类算法训练分类器,从而实现对查询接口所属领域进行分类。实验结果表明给出的方法具有良好的分类性能。 With the rapid development of Internet technology, a large number of Web databases have mushroomed and the number remains in a fast-growing trend. In order to effectively organise and utilise the information which hides deeply in Web databases, it is necessary to classify and integrate them according to domains. Since the query interface of Webpage is the unique channel to access the Web database, the classification of Deep Web data source can be realised by classifying the query interfaces. In this paper, a classification method based on text VSM of query interface is proposed. The basic idea is to build a vector space model (VSM) by using query interface text information firstly. Then the typical data mining classification algorithm is employed to train one or more classifiers, thus to classify the domains the query interfaces belonging to is implemented. Experimental result shows that the approach proposed in the paper has excellent classification performance.
出处 《计算机应用与软件》 CSCD 北大核心 2013年第8期54-58,共5页 Computer Applications and Software
基金 国家自然科学基金项目(61163057) 广西自然科学基金项目(2012jjAAG0063)
关键词 DEEP WEB 数据源分类 向量空间模型 数据挖掘 查询接口 Deep Web Date source classification Vector space model Data mining Query interface
  • 相关文献

参考文献10

  • 1刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 2Chang K C,He B,Li C,et al.Structured databases on the web:Obser-vations and Implications[C].SIGMOD Record,33,3:61-70.
  • 3Jay ant M,Jeffery S R,Cohen S,et al.Web-scale Data Integration:YouCall Only Afford to Pay as You Go[C]//Proceedings of the 3rd Bien-nial Conference on Innovative Data Systems Research.Asilomar,USA:[s.n.],2007:342-350.
  • 4Barbosa L,Freire J,Silva A.Organizing hidden-Web databases by clus-tering visible Web documents [C]//Doqac A.Proc.of IEEE the 23rdInt ’ l Conf.on Data Engineering.Istanbul:IEEE Computer Society,2007:326-335.
  • 5He B,Tao T,Chang KCC.Organizing structured Web sources by queryschemas:A clustering approach [C] //Gravano L.Proc.of ACM the13th Conf.on Information and Knowlege Management.Washington:ACM Press,2004:22-31.
  • 6Ipeirotis P G,Gravano L,Sahami M.Probe,count,and classify:catego-rizing hidden Web databases[C] //Proceedings of the 19th ACM SIG-MOD International Conference on Management of Data,Santa Barbara,2001:67-78.
  • 7Meng W,Wang W,Sun H,et al.Concept hierarchy based text databasecategorization [J].Knowl.Inf.Syst.,2002,4(2); 132-150.
  • 8He B,Tao T,Chang K C.Clustering structured Web sources:a schema-based,model-differentiation Approach [C] //Proceedings of the 9 th In-ternational Conference on Extending Database Technology,Heraklion,Crete,2004:536-546.
  • 9徐和祥,王述云,胡运发.基于本体的Deep Web查询接口分类[J].小型微型计算机系统,2008,29(10):1889-1892. 被引量:3
  • 10郭东伟,李三义,张仲明,刘淼.基于模型匹配的Deep Web数据库分类[J].吉林大学学报(理学版),2011,49(3):487-492. 被引量:1

二级参考文献88

  • 1杨立,左春,王裕国.基于语义距离的K-最近邻分类方法[J].软件学报,2005,16(12):2054-2062. 被引量:31
  • 2赵朋朋,高岭,崔志明.基于查询接口特征的Deep Web数据源自动分类[J].微电子学与计算机,2006,23(10):47-50. 被引量:11
  • 3Bergman M K.The Deep Web:Surfacing Hidden Value[J].Journal of Electronic Publishing,2001,7(1):8912-8914.
  • 4Chang K C C,HE Bin,LI Cheng-kai,et al.Structured Databases on the Web:Observations and Implications[J].Sigmod Record,2004,33(3):61-70.
  • 5Cope J,Craswell N,Hawking D.Automated Discovery of Search Interface on the Web[C]//Proceedings of the 14th Australasian Database Conference.Adelaide:[s.n.],2003:181-189.
  • 6Gupta S,Kaiser G,Neistadt D,et al.DOM-Based Content Extraction of HTML Documents[C]//Proceedings of the 12th International Conference on World Wide Web.New York:ACM,2003:207-214.
  • 7Salton G,Buckley B.Term Weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management,1998,24(5):513-523.
  • 8Ted T,Patwardhan S,Michelizzi J.Wordnet:Similarity-Measuring the Relatedness of Concepts[C]//Proceedings of the Nineteenth National Conference on Artificial Intelligence.Cambridge:AAAI Press,2004:1024-1025.
  • 9.[EB/OL].http://www.cogsci.Princeton.edu,.
  • 10刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136

共引文献136

同被引文献10

  • 1Bergman Michael K. The Deep Web: surfacing hiddenvalue [ EB/OL ]. http://www. brightplanet. com/2012/06/the-deep-web-surfacing-hidden-value/. [ 2014-6-18] .
  • 2Dalai D,Panwar A.Deep Web query extraction algorithmfor information retrieval system [ J ] . Computer Scienceand Information Technologies, 2014,5 ( 5 ) : 6867-6870.
  • 3Wu C M, Qiang B’Zou X C. Deep Web classificationbased on domain feature text[ J] .Advancements in com-puting technology,2011,3(6) :267-275.
  • 4Das N N,Kumar E.Identification of query forms for re-trieving the information from Deep Web[ J] .Transactionson Machine Learning and Artificial Intelligence, 2014,2(6):53-61.
  • 5Harrington P.机器学习实战[M].李锐,李鹏,曲亚东,等译.北京:人民邮电出版社,2013.
  • 6Han J,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2001.
  • 7姚旭,王晓丹,张玉玺,权文.特征选择方法综述[J].控制与决策,2012,27(2):161-166. 被引量:201
  • 8吴春明,谢德体.基于领域特征文本的Deep Web分类研究[J].计算机科学,2012,39(4):177-180. 被引量:4
  • 9郭建兵,崔志明,陈明,赵朋朋.一种基于范围型属性的Deep Web数据提取方法[J].计算机应用与软件,2013,30(2):54-57. 被引量:2
  • 10周由,戴牡红.语义分析与TF-IDF方法相结合的新闻推荐技术[J].计算机科学,2013,40(11A):267-269. 被引量:11

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部