期刊文献+

基于Dirichlet过程的Deep Web数据源聚类方法

Clustering of deep web sources based on Dirichlet process
下载PDF
导出
摘要 提出了一种基于Dirichlet过程的Deep Web数据源聚类方法 ,该方法采用层次Dirichlet过程(HDP)进行特征提取。首先将查询接口中原本高维稀疏的文本表示为主题特征,该过程能自动确定特征数。然后将文本看成多项式模型,采用Dirichlet过程混合模型聚类。该模型无需人工事先指定聚类个数,由Dirichlet过程根据数据自动计算得到,特别适用于Deep Web数据源数量大、变化快的特点。在通用数据集TEL-8上进行验证实验,并与其他聚类方法在F-measure和熵值两个指标上进行对比,均取得较好的结果 。 This paper proposed a clustering method of deep web sources based on dirichlet process. The proposed method adopted the hierarchical dirichlet process (HDP) for feature extraction. First it replaced the high-dimensional sparse text in the searching interface with topic feature and it could automatically determine the number of features. Then introduced the dirichlet process mixture model for the text clustering by treating the text as a multinomial model. Based on the dirichlet process, this model could automatically determine the number of clusters without any manual intervention. It was especially suitable for the characteristics of deep web sources, which is of large scale and changes fast. Compared with other clustering method, experimental results demonstrate a good performance on the date sets of TEL-8 on F-measure and Entropy.
出处 《微型机与应用》 2015年第7期75-78,共4页 Microcomputer & Its Applications
关键词 DEEP WEB 数据集成 特征提取 dirichlet过程 混合模型 deep web data integration feature extraction dirichlet process mixture model
  • 相关文献

参考文献12

  • 1BERGMAN M K. The deep web: surfacing hidden value[J]. The Journal of Electronic Publishing, 2001,7 ( 1 ) : 8912-8914.
  • 2王成良,桑银邦.Deep Web集成系统中同类主题数据源选择方法[J].计算机应用研究,2011,28(9):3364-3367. 被引量:1
  • 3EL-GAMIL B R, WINIWARTER W, BORIC B, et al. Deep web integrated systems: current achievements and open issues [C]. Proceedings of the 13th International Con- ference on Information Integration and Web-based Applica- tions and Services. ACM, 2011:447-450.
  • 4NAYAK R, SENELLART P, SUCHANEK F M, et al. Dis- covering interesting information with advances in Web tech- nology[J]. ACM SIGKDD Explorations Newsletter, 2013, 14 (2) : 63-81.
  • 5HE B, TAO T, CHANG K C C. Organizing structured web sources by query schemas: a clustering approach [C]. Pro- ceedings of the Thirteenth ACM International Conference on Information and Knowledge Management,ACM,2004:22-31.
  • 6BARBOSA L, FREIRE J, SILVA A. Organizing hidden- web databases by clustering visible web documents [C]. IEEE 23rd International Conference on Data Engineer- ing. IEEE, 2007: 326-335.
  • 7Zhao Pengpeng, Huang Li, Fang Wei, et al. Organizing structured deep web by clustering query interfaces link graph[M]. Berlin : Springer, 2008:683-690.
  • 8Xu Guangyue, Zheng Weimin, Wu Haiping, et al. Com- bining topic models and string kernel for deep web catego- rization [C]. Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on. IEEE, 2010 : 2791-2795.
  • 9ISI-IWARAN H, JAMES L F. Gibbs sampling methods for stick-breaking priors [J]. Journal of the American Statistical Association, 2001,96(453) : 161-173.
  • 10MORAES M C, HEUSER C A, MOREIRA V P, et al. Prequery discovery of domain-specific query forms: a sur- vey[J]. Knowledge and Data Engineering, IEEE Transac- tions on, 2013,25(8) : 1830-1848.

二级参考文献18

  • 1刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 2] Chang K C-C,He B,et al. Structured databases on the web: ob- servations and implications[ J]. SIGMOD Record, 2004,33(3) : 61 - 70.
  • 3Madhavan J, Cohen S, et al. Web scale data integration: you can afford to pay as you go [A]. Proceedings of CIDR'07 E C]. United States: CIDR, 2007. 342 - 350.
  • 4Wu C M, Qiang B H, et al. Deep web classification based on domain feature text [J]. International Jounaal of Advancements in Computing Technology,2011,3(6) :267 - 275.
  • 5Feng Y, Zhou Q W. Attribute decentralization algorithm-based deep web sources classification [J]. Advances in Information Sciences and Service Sciences, 2012,4 ( 1 ) : 423 - 431.
  • 6Noor U, Rashid Z, et al. TODWEB: training-less ontology based deep web source classification[ A ]. ACM International Conference Proceeding Series[ C ]. United States: ACM, 2011. 190- 197.
  • 7Le H Q, Conrad S. Classifying structured web soutrs using ag- gressive feature selection [ A ]. WEBIST 2009 [C]. United States: ISA,2009.618 - 625.
  • 8Barbosa L,Fleire J,et al. Organizing hidden-web databases by clustering visible web documents[ A]. Proceedings of Interna- tional Conference on Data Engineering [ C ]. United States: IEEE, 2007.326 - 335.
  • 9He B, Tao T, et al. Organizing structured web sources by query schemas: a clustering approach [ A ]. Proceeding of CIKM'04[C] .United States:ACM,2004.22 - 31.
  • 10Zhao P P,Huang Let al. Organizing structured deep web by clustering query interfaces link graph [ A ]. Lecture Notes in Computer Science[ C]. Germany: Springer, 2008.683 - 690.

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部