期刊文献+

特征加权距离与软子空间学习相结合的文本聚类新方法 被引量:22

A Novel Text Clustering Algorithm Based on Feature Weighting Distance and Soft Subspace Learning
下载PDF
导出
摘要 文本数据维数高、数据分布稀疏、不同类别的特征相互重叠,这为聚类分析提出了挑战.针对文本数据的这一特点,将特征加权技术与软子空间相结合,基于模糊聚类的算法框架,提出了一种适用于高维文本数据的软子空间模糊聚类新方法.首先,基于加权范数理论,提出了新的特征加权距离计算方法.接着,将其与软子空间学习的理论框架相结合,提出了面向模糊聚类的新的目标学习准则.通过向约束条件中引入熵指数r,从而扩展了模糊指数m的取值范围,并给出了物理解释.基于Zangwill收敛定理对算法的全局收敛性给出理论证明.实验表明,文中算法可以使软子空间学习和聚类分析同时进行,其性能比现有的相关算法有了较大的提高. The text data are characterized by high dimensionality and feature overlapping among different clusters, which is a great challenge for the real-world data mining applications. This paper proposes a novel fuzzy clustering algorithm by integrating the feature weighting metric into the framework of soft subspace learning. Firstly, the feature weighting metric is presented based on the concept of vector norm. Then a novel learning criterion is proposed based on the combination of feature weighting metric and soft subspace clustering. An entropy exponent r is intro- duced into the constraints so that the span of the fuzzy index m is extended. A physical explanation from the view of the information theory is given. A global convergence theory is also estab- lished by applying Zangwill's convergence theorem. At last, experiments are conducted on both synthesis and real text data and the experimental results show that the proposed algorithm can perform tasks of clustering analysis and soft subspace learning simultaneously and obtain better results than some of the existing approaches.
出处 《计算机学报》 EI CSCD 北大核心 2012年第8期1655-1665,共11页 Chinese Journal of Computers
基金 国家自然科学基金(60903100 60975027 61170122) 江苏省自然科学基金(BK2011417) 江苏"333高层次人才培养工程"(BRA2011142) 中央高校基本科研业务专项资金项目(JUSRP111A38)资助~~
关键词 模糊聚类 文本聚类 软子空间 特征加权距离 全局收敛性 fuzzy clustering text clustering soft subspace feature weighting distance global convergence
  • 相关文献

参考文献3

二级参考文献39

  • 1Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York:Plenum Press, 1981.
  • 2Pal N R, Bezdek J C. On cluster validity for the fuzzy c-mean model. IEEE Transactions on Fuzzy Systems, 1995,3 (3): 370-379.
  • 3Fadili M J, Ruan S, Bloyet D, Mayoyer B. On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Medical Image Analysis,2001,5(1) :55-67.
  • 4Yu Jian,Cheng Qian-Sheng, Huang Hou-Kuan. On weighting exponent of the fuzzy c-means model. In: Proceedings of ICYCS2001, Hangzhou, 2001, II : 631- 633.
  • 5Bezdek J C, Hathaway R J, Sabin M J, Tucker W. Convergence theory for fuzzy c-means: Counter-examples and repairs.IEEE Transactions on SMC, 1987,17(5): 873-877.
  • 6Choe H,Jordan J B. On the optimal choice of parameters in a fuzzy c-means algorithm. In: Proceedings of IEEE International Conference on Fuzzy Systems, 1992. 349-354.
  • 7Yi Shen, Hong Shi, Jian Qiu-Zhang. Improvement and optimization of a fuzzy c-means clustering algorithm. In: Proceedings of IEEE Instrumentation and Measurement Technology Conference, Budapest, Hungary, 2001.
  • 8Tucker WT. Couterexamples to the convergence theorem for fuzzy ISODATA clustering algorithm. In: Bezdek J C ed. The Analysis of fuzzy Information, Boca Raton, FL: CRC Press,1987, 3:110-117.
  • 9Baraldi A, Blonda P, Parmiggiani F et al. Model transitions in descending FLVQ. IEEE Transactions on Neural Networks,1998,9(5) :724-737.
  • 10Dave R N, Krishnapuram R. Robust clustering methods: A unified view. IEEE Transactions on Fuzzy Systems, 1997,5 (2) :270-293.

共引文献243

同被引文献176

引证文献22

二级引证文献184

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部