一种优化的顺序IB文本聚类算法被引量：2

An Improved Sequential IB Algorithm for Document Clustering

导出

摘要针对顺序 IB(sIB)算法在文本聚类上存在的诸如易陷入局部优解、效率较低等问题,基于模拟退火方法,提出一种优化的顺序文本聚类算法(SA-isIB).该算法根据一个合理的退火序列,从基本 sIB 算法产生的初始聚类结果中随机选取一定比例的文本,对其类标记进行随机修改并重新对解进行优化,经过退火过程后,得到比 sIB 算法精度更高的文本聚类结果.文本数据集上的实验结果表明,SA-isIB 能有效提高 sIB 算法用于文本聚类的精度. To solve the problems of local optima and low efficiency in sequential information bottleneck （sIB） algorithm for document clustering, an improved sIB algorithm is proposed, namely SA-isIB. By a reasonable annealing sequence, a certain proportional of documents are selected randomly from the initial clustering solution of basic sIB algorithm. Then the clustering labels of selected documents are revised and the solution is optimized iteratively. After the process of simulated annealing, higher accuracy document clustering solutions are obtained. Experimental results on document datasets show that by using SA-isIB algorithm the accuracy of sIB algorithm for document clustering is improved efficiently.

作者叶阳东张洁刘东

机构地区郑州大学信息工程学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2008年第3期417-423,共7页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金资助项目(No.60674001 60773048)

关键词文本聚类信息瓶颈理论模拟退火基于模拟退火的迭代顺序IB(SA—isIB)算法 Document Clustering, Information Bottleneck （IB） Theory, Simulated Annealing,Simulated Annealing-Iterative Sequential Information Bottleneck （SA-isIB） Algorithm

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1Tishby N, Pereira F, Bialek W. The Information Bottleneck Method //Proc of the 37th Annual Allerton Conference on Communication, Control and Computing. Illinois, USA, 1999 : 368 - 377
2Slonim N, Friedman N, Tishby N. Unsupervised Document Classification Using Sequential Information Maximization//Proc of the 25 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland, 2002:129 - 136
3Goldberger J, Gordon S, Greenspan H. Unsupervised Image-Set Clustering Using an Information Theoretic Framework. IEEE Trans on Image Processing, 2006, 15 (2) : 449 - 458
4Slonim N, Somerville R, Tishby N, et al. Objective Classification of Galaxy Spectra Using the Information Bottleneck Method. Monthly Notices of the Royal Astronomical Society, 2001, 323 (2) : 270 - 284
5Tishby N, Slonim N. Data Clustering by Markovian Relaxation and the Information Bottleneck Method// Proc of the 13th Annual Conference on Neural Information Processing Systems. Colorado, USA, 2001 : 640 -646
6Schneidman E, Bialek W, Berry M J. An Information Theoretic Approach to the Functional Classification of Neurons // Proc of the 15th Annual Conference on Neural Information Processing Systems. Vancouver, Canada, 2002 : 197 - 204
7Gorodetsky M. Methods for Discovering Semantic Relations between Words Based on Co-Occurrence Patterns in Corpora. Masters Dissertation. Jerusalem, Palestine: Hebrew University. School of Computer Science and Engineering, 2002
8Slonim N. The Information Bottleneck: Theory and Application. Ph. D Dissertation. Jerusalem, Palestine : Hebrew University. School of Computer Science and Engineering, 2002
9Chechik G, Tishby N. Extracting Relevant Structures with Side Information// Proc of the 16th Annual Conference on Neural Information Processing Systems. Vancouver, Canada, 2002:857 - 864
10Gondek D, Hofmann T. Non-Redundant Data Clustering//Proc of the 4th IEEE International Conference on Data Mining. Brighton, UK, 2004:75 -82

同被引文献8

1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量：5
2TISHBY N,PEREIRA F C,BIALEK W.The information bottleneck methodEEB/OL].[2009-06-06].http://www.princeton,edu/-wbialek/our _ paper/tishbytal _ 99.pdf.
3SLONIM N,FRIEDMAN N,TISHBY N.Unsupervised document classification using sequential information maximizationFC]//Proceedings of the 25th Ann Int ACM SIGIR Conf on Research and Development in Information Retrieval.New York,USA:ACM Press,2002:129-136.
4GOLDBERGER J,GORDON S,GREENSPAN H.Unsupervised image-set clustering using an information theoretic framework[l].IEEE Trans on Image Processing,2006,15 (2):449-458.
5SLONIM N,SOMERVILLE R,TISHBY N,et al.Objective classification of galaxies spectra using the information bottleneck method[i].Monthly Notices of the Royal Astronomical Society,2001,323:270-284.
6TISHBY N,SLONIM N.Data clustering by Markovian relaxation and the information bottleneck method[C]//Advances in Neural Information Processing Systems (NIPS-13).Cambridge,MA:MIT Press,2001:640-646.
7SLONIM N.The information bottleneck:theory and application[D].Jerusalem,Israel:Hebrew University of Jerusalem,2002.
8夏利民,谭立球,钟洪.基于信息瓶颈算法的图像语义标注[J].模式识别与人工智能,2008,21(6):812-818. 被引量：6

引证文献2

1王兆庆,任永利,叶阳东.一种基于共现特征的顺序IB算法[J].广西师范大学学报（自然科学版）,2009,27(3):126-129.
2闫小强,卢耀恩,娄铮铮,叶阳东.基于并行信息瓶颈的多语种文本聚类算法[J].模式识别与人工智能,2017,30(6):559-568. 被引量：2

二级引证文献2

1李立莉.大数据环境下图书碎片化信息精确整合仿真[J].计算机仿真,2018,35(9):413-416. 被引量：2
2王方红,黄文彪.基于中心平面的聚类模型及在电商中的应用[J].数学的实践与认识,2021,51(2):152-157.

1李钊,孙占全,李晓,李诚.基于信息损失量的特征选择方法研究及应用[J].山东大学学报（理学版）,2016,51(11):7-12. 被引量：2
2胡文生,赵明,杨剑锋,龙士工.基于UML模型的敏捷开发迭代顺序的确定[J].计算机科学,2013,40(12):215-218. 被引量：4
3胡文生,赵明,杨剑峰,贾国荣.敏捷开发过程中的迭代策略分析[J].微电子学与计算机,2012,29(5):165-169. 被引量：9
4韦虎,刘胜兰,张丽艳,张辉.双目立体测量系统中的标记点配准算法研究[J].中国机械工程,2009(14):1736-1740. 被引量：7
5刘永利,付丽丽.一种改进的社区探测方法[J].河南理工大学学报（自然科学版）,2015,34(1):91-95. 被引量：1
6郑兆青,桑红石,赖晓玲,沈绪榜.一种新的用于H.264/AVC的运动估计VLSI结构[J].计算机学报,2007,30(12):2101-2108. 被引量：3
7刘永利,万兴.融合信息瓶颈的模糊三维聚类[J].北京邮电大学学报,2016,39(3):70-74. 被引量：1
8杨杰,张国忠,高红亮.Julia集的反函数迭代算法[J].计算机仿真,2006,23(5):68-70. 被引量：2
9朱真峰,叶阳东,Gang Li.基于变异的迭代sIB算法[J].计算机研究与发展,2007,44(11):1832-1838. 被引量：5
10周斌,王连堂,王俊杰,翟亮亮.利用改进ART算法模拟重建炉膛截面温度场[J].科学技术与工程,2008,8(12):3136-3140.

模式识别与人工智能

2008年第3期

浏览历史

内容加载中请稍等...

一种优化的顺序IB文本聚类算法被引量：2

参考文献12

同被引文献8

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种优化的顺序IB文本聚类算法 被引量：2

参考文献12

同被引文献8

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种优化的顺序IB文本聚类算法被引量：2