期刊文献+

面向范畴类型数据的sIB算法 被引量:5

CD-sIB:A Kind of sIB Algorithm Orient to Categorical Data
下载PDF
导出
摘要 本文针对sIB算法仅适用于共现数据的问题,提出了一种能够自动进行范畴类型数据分析的sIB算法:CD-sIB.该算法根据范畴类型数据的离散化表示、不同属性值有限的特征,进行数据的属性的拓展和二元化处理,基于属性值的出现进行X,Y的联合分布的计算,使得sIB算法可有效应用于范畴类型数据的分析.实验结果表明:CD-sIB算法相对于现有的面向范畴类型数据聚类模式分析的算法GAClust和K-modes具有明显的优势;CD-sIB算法在进行数据属性概化程度高、类数据分布相对平衡的范畴类型数据的分析中,在效率和精确度方面均很突出. The sIB algorithm has previously been only applied to the analysis of co-occurence data.Therefore,it cannot directly analyze categorical data that do not appear in the form of co-occurrence of two variables X,Y.Aiming to solve the problem,this paper proposes a CD-sIB algorithm for automatically analyzing categorical data based on the theory of sIB algorithm.According to the nature that categorical data is discrete and its distinct attribute value is finite,CD-sIB algorithm counts joint distribution of relevant variable X,Y based on the occurence frequency of attribute value by extending the attributes of dataset and utilizing binarization to process the categorical data.Consequently,our algorithm can be effectively employed in analyzing the categorical data.As shown by our experimental results,CD-sIB outperforms the GAClust and the K-modes algorithm,and it achieves high precision and efficiency in analyzing categorical data,especially in the analysis of categorical data which is highly generalizable and comparatively balanced in the data distribution of each class.
出处 《电子学报》 EI CAS CSCD 北大核心 2009年第10期2165-2172,共8页 Acta Electronica Sinica
基金 国家自然科学基金(No.60773048)
关键词 IB理论 SIB算法 范畴类型数据 概化 聚类 IB theory sIB algorithm categorical data generalization clustering
  • 相关文献

参考文献20

  • 1N Tishby, F Pereira, W Bialek. The information bottleneck method[ A] .Proceedings of 37th Allerton Conference on Communication, Control and Computing[ C]. 1999. 368- 377.
  • 2N Slonim, N Friedman, N Tishby. Unsupervised document classification using sequential information maximization[ A ]. Proceedings of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval [ C ]. 2002. 129 - 136.
  • 3N Slonim, N Tishby. Document clustering using word clusters via the information bottleneck method[ A]. Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Athens, Greece, 2000.208 - 215.
  • 4J Goldberger, S Gordon, H Greenspan. Unsupervised image-set clustering using an information theoretic framework[ J]. IEEE Transactions on Image Processing, 2006,5 (2) : 449 - 458.
  • 5M Gorodetsky. Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[ D ]. School of Computer Science and Engineering, Hebrew university, Jerusalem, 2002.
  • 6Winston H Hsu, Lyndon S Kennedy, Shih-Fu Chang. Video search remnking via information bottleneck principle[ A]. Proceedings of ACM International Conference on Multimedia[ C]. Santa Barbara, CA, USA, 2006.35 - 44.
  • 7N Slonim. The information bottleneck: Theory and Application [ D ]. The Hebrew University of Jerusalem, Jerusalem, Israel,2002.
  • 8N Slonim, N Tishby. Agglomerative information bottleneck [ A]. Proceedings of Advances in Neural Information Processing Systems (NIPS-2000) [ C ]. 1999, vol. 12.617 - 623.
  • 9J Peltonen, J Sinkkonen, S Kaski. Sequential information bottleneck for finite data[ A]. Proceedings of 21st International Conference on Machine Learning[ C]. Madison, USA, 2004. 647 - 654.
  • 10朱真峰,叶阳东,Gang Li.基于变异的迭代sIB算法[J].计算机研究与发展,2007,44(11):1832-1838. 被引量:5

二级参考文献41

  • 1Winston H H,Shih C F.Visual cue cluster construction via information bottleneck principle and kernel density estimation//Proceedings of the 4th International Conference Image and Video Retrieva.Singapore,2005:82-91
  • 2Slonim N,Somerville R,Tishby N,Lahav O.Objective classification of galaxies spectra using the information bottleneck method.Monthly Notices of the Royal Astronomical Society,2001,323(2):270-284
  • 3Slonim N,Tishby N.The power of word clusters for text classification//Proceedings of the 23rd European Collquium on Information Retrieval Research.Darmstadt,Germany,2001:1-12
  • 4Goldberger J,Gordon S,Greenspan H.Unsupervised imageset clustering using an information theoretic framework.IEEE Transactions on Image Processing,2006,15(2):449-458
  • 5Tishby N,Pereira F,Bialek W.The information bottleneck method//Proceedings of the 37th Allerton Conference on Communication,Control and Computing.Illinois,USA,1999:368-377
  • 6Slonim N,Friedman N,Tishby N.Unsupervised document classification using sequential information maximization//Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Tampere,Finland,2002:129-136
  • 7Slonim N,Tishby N.Document clustering using word clusters via the information bottleneck method//Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Athens,Greece,2000:208-215
  • 8Tishby N,Slonim N.Data clustering by Markovian relaxation and the information bottleneck method//Proceedings of the 13th Annual Conference on Neural Information Processing Systems.Colorado,USA,2001:640-646
  • 9Schneidman E,Bialek W,Berry M J.An information theoretic approach to the functional classification of neurons//Proceedings of the 15th Annual Conference on Neural Information Processing Systems.Vancouver,British Columbia,Canada,2002:197-204
  • 10Gorodetsky M.Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[Masters dissertation].School of Computer Science and Engineering,Hebrew University,Jerusalem,2002

共引文献6

同被引文献80

  • 1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量:5
  • 2CHEN L. The design and realization of the information service sys- tem for taxi business based on GPS/GIS [ C]// LEITS: 2010 Inter- national Conference on Logistics Engineering and Intelligent Trans- portation Systems. Piscataway: IEEE, 2010:1 - 4.
  • 3FABRIZIO R, NICOLA D M. Applying the information bottleneck to sta- tistical relational learning[ J]. Machine Learning, 2012, 86( 1 ) : 89 - 114.
  • 4GEDEON T, PARKER A E, DIMITROV A G. The mathematical structure of information bottleneck methods[ J]. Entropy, 2012, 14 (3) :456 -479.
  • 5YE Y D, REN Y L, LI G. Using local density information to im- prove IB algorithms[ J]. Pattern Recognition Letters, 2011, 32 (2) : 3 I0 - 320.
  • 6DHANALAKSHMI S, RAVICHANDRAN T. A modified approach for image segmentation in information bottleneck method[ J]. Inter- national Journal of Advanced Research in Computer Engineering & Technology, 2012, 1(7) :59 -63.
  • 7Bekkennan R, El-Yaniv R, Tishby N. Distributional Word Clusters vs Words for Text Categorization. Journal of Machine Learning Re?search, 2003, 3: 1183 -1208.
  • 8Slonim N. The Infonnation Bottleneck: Theory and Application. Ph. D Dissertation. Jerusalem, Israel: The Hebrew University of Je?rusalem, 2002.
  • 9Seldin Y, Slonim N, Tishby N. Information Bottleneck for Non Co-Occurrence Data//Scholkopf B, Platt]C, Hoffman T, eds. Ad?vances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007, XIX: 1241-1248.
  • 10Shamir O, Sabato S, Tishby N. Learning and Generalization with the Information Bottleneck. Theoretical Computer Science, 2010, 411(29/30): 2696-2711.

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部