期刊文献+

非共现数据两阶段加权IB算法

Two-stage Attribute Weighting IB Algorithm for Non Co-occurrence Data
下载PDF
导出
摘要 非共现数据是指不符合联合概率分布,而是符合一个未知函数的数据.将非共现数据转化为共现形式后可以采用熵来定量度量信息并进行聚类.但是,现有算法假设非共现数据的各个属性特征对聚类贡献均匀,没有考虑代表性属性和不相关(冗余)属性对聚类效果的不同影响.因此,本文提出一个非共现数据的两阶段加权IB算法(TSAW-sIB),在非共现数据共现转化的两个阶段,从"非共现/共现/联合"三个视角观察非共现数据,突出代表性属性,抑制冗余属性,获得更能准确反映非共现数据特征的数据表示并进行聚类.实验表明,TSAW-sIB算法优于ROCK、COOLCAT和LIMBO算法. Non co-occurrence data does not appear in the form of co-occurrence of two variables X, Y, hut rather as a sample of values of an unknown function Z(X, Y) The co-occurrence transformation of non co-occurrence data is necessary for clustering on the concept of Shannon entropy. However, these clustering algorithms treat all features fairly and set weights of all features equally Therefore, the paper proposes a two-stage attribute weighting IB Algorithm (TPAW-slB). At two stages of the formation, we highlight representative features and dim irrelevant features from three viewpoints: non and both. Experiments show that the TPAW-slB algorithm is superior to the ROCK algorithm, the COOLCAT algorithm and the LIMBO algorithm.
作者 姬波 叶阳东
出处 《小型微型计算机系统》 CSCD 北大核心 2012年第10期2278-2282,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60773048 61170223)资助
关键词 非共现数据 特征加权 两阶段 信息瓶颈方法 聚类 non co-occurrence feature weighting two stage information bottleneck clustering
  • 相关文献

参考文献1

二级参考文献20

  • 1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量:5
  • 2N Tishby, F Pereira, W Bialek. The information bottleneck method[ A] .Proceedings of 37th Allerton Conference on Communication, Control and Computing[ C]. 1999. 368- 377.
  • 3N Slonim, N Friedman, N Tishby. Unsupervised document classification using sequential information maximization[ A ]. Proceedings of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval [ C ]. 2002. 129 - 136.
  • 4N Slonim, N Tishby. Document clustering using word clusters via the information bottleneck method[ A]. Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Athens, Greece, 2000.208 - 215.
  • 5J Goldberger, S Gordon, H Greenspan. Unsupervised image-set clustering using an information theoretic framework[ J]. IEEE Transactions on Image Processing, 2006,5 (2) : 449 - 458.
  • 6M Gorodetsky. Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[ D ]. School of Computer Science and Engineering, Hebrew university, Jerusalem, 2002.
  • 7Winston H Hsu, Lyndon S Kennedy, Shih-Fu Chang. Video search remnking via information bottleneck principle[ A]. Proceedings of ACM International Conference on Multimedia[ C]. Santa Barbara, CA, USA, 2006.35 - 44.
  • 8N Slonim. The information bottleneck: Theory and Application [ D ]. The Hebrew University of Jerusalem, Jerusalem, Israel,2002.
  • 9N Slonim, N Tishby. Agglomerative information bottleneck [ A]. Proceedings of Advances in Neural Information Processing Systems (NIPS-2000) [ C ]. 1999, vol. 12.617 - 623.
  • 10J Peltonen, J Sinkkonen, S Kaski. Sequential information bottleneck for finite data[ A]. Proceedings of 21st International Conference on Machine Learning[ C]. Madison, USA, 2004. 647 - 654.

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部