非共现数据两阶段加权IB算法

Two-stage Attribute Weighting IB Algorithm for Non Co-occurrence Data

下载PDF

导出

摘要非共现数据是指不符合联合概率分布,而是符合一个未知函数的数据.将非共现数据转化为共现形式后可以采用熵来定量度量信息并进行聚类.但是,现有算法假设非共现数据的各个属性特征对聚类贡献均匀,没有考虑代表性属性和不相关(冗余)属性对聚类效果的不同影响.因此,本文提出一个非共现数据的两阶段加权IB算法(TSAW-sIB),在非共现数据共现转化的两个阶段,从"非共现/共现/联合"三个视角观察非共现数据,突出代表性属性,抑制冗余属性,获得更能准确反映非共现数据特征的数据表示并进行聚类.实验表明,TSAW-sIB算法优于ROCK、COOLCAT和LIMBO算法. Non co-occurrence data does not appear in the form of co-occurrence of two variables X, Y, hut rather as a sample of values of an unknown function Z（X, Y） The co-occurrence transformation of non co-occurrence data is necessary for clustering on the concept of Shannon entropy. However, these clustering algorithms treat all features fairly and set weights of all features equally Therefore, the paper proposes a two-stage attribute weighting IB Algorithm （TPAW-slB）. At two stages of the formation, we highlight representative features and dim irrelevant features from three viewpoints： non and both. Experiments show that the TPAW-slB algorithm is superior to the ROCK algorithm, the COOLCAT algorithm and the LIMBO algorithm.

作者姬波叶阳东

机构地区郑州大学信息工程学院计算机科学技术系

出处《小型微型计算机系统》 CSCD 北大核心 2012年第10期2278-2282,共5页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(60773048 61170223)资助

关键词非共现数据特征加权两阶段信息瓶颈方法聚类 non co-occurrence feature weighting two stage information bottleneck clustering

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1叶阳东,何锡点,贾利民.面向范畴类型数据的sIB算法[J].电子学报,2009,37(10):2165-2172. 被引量：5

二级参考文献20

1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量：5
2N Tishby, F Pereira, W Bialek. The information bottleneck method[ A] .Proceedings of 37th Allerton Conference on Communication, Control and Computing[ C]. 1999. 368- 377.
3N Slonim, N Friedman, N Tishby. Unsupervised document classification using sequential information maximization[ A ]. Proceedings of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval [ C ]. 2002. 129 - 136.
4N Slonim, N Tishby. Document clustering using word clusters via the information bottleneck method[ A]. Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Athens, Greece, 2000.208 - 215.
5J Goldberger, S Gordon, H Greenspan. Unsupervised image-set clustering using an information theoretic framework[ J]. IEEE Transactions on Image Processing, 2006,5 (2) : 449 - 458.
6M Gorodetsky. Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[ D ]. School of Computer Science and Engineering, Hebrew university, Jerusalem, 2002.
7Winston H Hsu, Lyndon S Kennedy, Shih-Fu Chang. Video search remnking via information bottleneck principle[ A]. Proceedings of ACM International Conference on Multimedia[ C]. Santa Barbara, CA, USA, 2006.35 - 44.
8N Slonim. The information bottleneck: Theory and Application [ D ]. The Hebrew University of Jerusalem, Jerusalem, Israel,2002.
9N Slonim, N Tishby. Agglomerative information bottleneck [ A]. Proceedings of Advances in Neural Information Processing Systems (NIPS-2000) [ C ]. 1999, vol. 12.617 - 623.
10J Peltonen, J Sinkkonen, S Kaski. Sequential information bottleneck for finite data[ A]. Proceedings of 21st International Conference on Machine Learning[ C]. Madison, USA, 2004. 647 - 654.

共引文献4

1姬波,叶阳东,卢红星.基于样本权重的出租车聚集区识别算法[J].计算机应用,2013,33(5):1338-1342. 被引量：1
2姬波,叶阳东.非共现数据的二元化加权转化算法[J].模式识别与人工智能,2013,26(6):584-591.
3娄铮铮,叶阳东,刘瑞娜.基于IB方法的无冗余多视角聚类[J].计算机研究与发展,2013,50(9):1865-1875. 被引量：6
4娄铮铮,杨晨,叶阳东.基于数据选择模型的IB算法[J].电子学报,2014,42(9):1839-1846. 被引量：2

1辛伯宇.基于查询的XML数据库设计[J].电脑开发与应用,2013,26(11):32-33.
2柏战华,吕强.基于WebService和OPC技术的综合监控系统[J].微计算机信息,2008,24(7):54-55. 被引量：7
3江鹏,叶阳东,娄铮铮.一种面向非平衡数据的多簇IB算法[J].计算机科学,2016,43(7):245-250. 被引量：2
4袁华强,叶阳东,刘东.遗传顺序IB算法[J].电子学报,2009,37(8):1804-1809. 被引量：1
5李德栋,肖楚琬,庞威.基于信息瓶颈法的图像分离-合并分割算法[J].计算机与现代化,2013(11):20-24.
6朱真峰,叶阳东,Gang Li.基于变异的迭代sIB算法[J].计算机研究与发展,2007,44(11):1832-1838. 被引量：5
7夏利民,谭立球,钟洪.基于信息瓶颈算法的图像语义标注[J].模式识别与人工智能,2008,21(6):812-818. 被引量：6
8逍遥.操作系统度量信息的探知[J].家庭电脑世界,2001(1):73-73.
9姬波,叶阳东,卢红星.基于样本权重的出租车聚集区识别算法[J].计算机应用,2013,33(5):1338-1342. 被引量：1
10拓守恒,雍龙泉.一种用于PID控制的教与学优化算法[J].智能系统学报,2014,9(6):740-746. 被引量：12

小型微型计算机系统

2012年第10期

浏览历史

内容加载中请稍等...

非共现数据两阶段加权IB算法

参考文献1

二级参考文献20

共引文献4

相关作者

相关机构

相关主题

浏览历史