期刊文献+

基于主题和表单属性的深层网络数据源分类方法 被引量:2

An Improved Method for Deep Web Sources Classification Based on the Theme and Form Attributes
下载PDF
导出
摘要 当前深层网络中蕴含着高质量的海量信息并且其数量不断地增长,由于深层网络具有分布、异构、自治等特点,用户高效、快捷地获取自己感兴趣的信息面临巨大挑战.将深层网络数据源按领域分类是解决这一挑战的基础.本文以对航空订票、图书、汽车和房地产领域的200多个数据源的统计和分析为基础,充分利用主题和表单属性信息,提出了一种新的深层网络数据源分类方法以及改进的查询接口相似性度量方法,实现深层网络数据源的自动分类.本文还提出了一种查询接口标记策略,以降低随机选择初始中心点所产生的影响.实验结果表明该方法具有较高的分类精度. Nowadays, Deep web consists of vast amounts of high quality information which is rising rapidly. However, because of its distributed character, heterogeneity, autonomy etc, it is faced with huge challenges for users to obtain the information efficiently and quickly which they are interested in.Deep Web data sources are organized by the domains in the real world,which is the foundation for addressing this challenge. In this paper, based on the statistics and analysis on more than 200 data sources which are from four different fields(i, e., Airfares, Books, Automobiles and Real estates, a novel classification method and an improved similarity measure of query interfaces were proposed to realize the automatic classification of large masses of deep web sources, which make full use of theme information and form attributes. In addition, we present a strategy of tagging query interface to reduce the influence resulted from choosing initial centers randomly. The experimental results indicated that the method is effective and has higher accuracy.
出处 《电子学报》 EI CAS CSCD 北大核心 2013年第2期260-266,共7页 Acta Electronica Sinica
基金 国家自然科学基金(No.60973028 No.61272185) 黑龙江省自然科学基金(No.F201238)
关键词 表单主题和属性 查询接口标记 深层网络 数据源自动分类 form theme and attributes query interface tagging deep web automatic classification of sources
  • 相关文献

参考文献17

  • 1] Chang K C-C,He B,et al. Structured databases on the web: ob- servations and implications[ J]. SIGMOD Record, 2004,33(3) : 61 - 70.
  • 2Madhavan J, Cohen S, et al. Web scale data integration: you can afford to pay as you go [A]. Proceedings of CIDR'07 E C]. United States: CIDR, 2007. 342 - 350.
  • 3刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 4申德荣,刘丽楠,寇月,聂铁铮,于戈.一种面向Deep Web数据源的重复记录识别模型[J].电子学报,2010,38(2):275-281. 被引量:3
  • 5Wu C M, Qiang B H, et al. Deep web classification based on domain feature text [J]. International Jounaal of Advancements in Computing Technology,2011,3(6) :267 - 275.
  • 6Feng Y, Zhou Q W. Attribute decentralization algorithm-based deep web sources classification [J]. Advances in Information Sciences and Service Sciences, 2012,4 ( 1 ) : 423 - 431.
  • 7Noor U, Rashid Z, et al. TODWEB: training-less ontology based deep web source classification[ A ]. ACM International Conference Proceeding Series[ C ]. United States: ACM, 2011. 190- 197.
  • 8马军,宋玲,韩晓晖,闫泼.基于网页上下文的Deep Web数据库分类[J].软件学报,2008,19(2):267-274. 被引量:31
  • 9Le H Q, Conrad S. Classifying structured web soutrs using ag- gressive feature selection [ A ]. WEBIST 2009 [C]. United States: ISA,2009.618 - 625.
  • 10Barbosa L,Fleire J,et al. Organizing hidden-web databases by clustering visible web documents[ A]. Proceedings of Interna- tional Conference on Data Engineering [ C ]. United States: IEEE, 2007.326 - 335.

二级参考文献100

  • 1姜韶华,党延忠.基于长度递减与串频统计的文本切分算法[J].情报学报,2006,25(1):74-79. 被引量:14
  • 2中国互联网络信息中心[OL].[2008-08-08].http:∥www.cnnic.net/html/Dir/2008/01/17/4966.htm.
  • 3Salton G,Buckley C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1988,24(5):513-523.
  • 4Mehrnoush Shamsfard,Ahmad Abdollahzadeh Barforoush.Learning ontologies from natural language texts[J].International Journal of Human-Computer Studies,2004,60(1):17-63.
  • 5Paolo Bouquet,Marc Ehrig,Jér?me Euzenat,et al.Specification of a common framework for characterizing alignment.KnowledgeWeb deliverable D2.2.1v2,2004.[EB/OL].[2006-11-13].www.aifb.uni-karlsruhe.de/WBS/phi/pub/kweb-221.pdf.
  • 6Gruber T R.A translation approach to portable ontology specifications[J].Knowledge Acquisition,1993,5(2):199-220.
  • 7Roberto Navigli,Paola Velardi.Learning domain ontologies from document warehouses and dedicated Web sites[J].Computational Linguistics.MIT Press,2004,50(2):151-179.
  • 8Salton G,Buckley C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1988,24(5):513-523.
  • 9Rudi Studer,V.Richard Benjamins,Dieter Fensel.Knowledge engineering:principles and methods[J].Data and Knowledge Engineering,1998,25(1-2):161-197.
  • 10Data Stage [ EB/OL]. http://www. ardentsoftware. com/ datawarehouse/datastage, 2007.

共引文献168

同被引文献13

  • 1BERGMAN M K. The deep web: surfacing hidden value[J]. The Journal of Electronic Publishing, 2001,7 ( 1 ) : 8912-8914.
  • 2EL-GAMIL B R, WINIWARTER W, BORIC B, et al. Deep web integrated systems: current achievements and open issues [C]. Proceedings of the 13th International Con- ference on Information Integration and Web-based Applica- tions and Services. ACM, 2011:447-450.
  • 3NAYAK R, SENELLART P, SUCHANEK F M, et al. Dis- covering interesting information with advances in Web tech- nology[J]. ACM SIGKDD Explorations Newsletter, 2013, 14 (2) : 63-81.
  • 4HE B, TAO T, CHANG K C C. Organizing structured web sources by query schemas: a clustering approach [C]. Pro- ceedings of the Thirteenth ACM International Conference on Information and Knowledge Management,ACM,2004:22-31.
  • 5BARBOSA L, FREIRE J, SILVA A. Organizing hidden- web databases by clustering visible web documents [C]. IEEE 23rd International Conference on Data Engineer- ing. IEEE, 2007: 326-335.
  • 6Zhao Pengpeng, Huang Li, Fang Wei, et al. Organizing structured deep web by clustering query interfaces link graph[M]. Berlin : Springer, 2008:683-690.
  • 7Xu Guangyue, Zheng Weimin, Wu Haiping, et al. Com- bining topic models and string kernel for deep web catego- rization [C]. Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on. IEEE, 2010 : 2791-2795.
  • 8ISI-IWARAN H, JAMES L F. Gibbs sampling methods for stick-breaking priors [J]. Journal of the American Statistical Association, 2001,96(453) : 161-173.
  • 9MORAES M C, HEUSER C A, MOREIRA V P, et al. Prequery discovery of domain-specific query forms: a sur- vey[J]. Knowledge and Data Engineering, IEEE Transac- tions on, 2013,25(8) : 1830-1848.
  • 10TEH Y W, JORDAN M I, BEAL M J, et al. Hierarchi- cal dirichlet processes [J]. Journal of the American Statis- tical Association, 2006,101(476) : 1566-1581.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部