摘要
为了提升分类数据聚类集成的效果,提出了一种新的相关随机子空间聚类集成模型。该模型利用粗糙集理论将分类属性分解成相关和不相关子集,在相关属性子集上随机生成多个相关子空间并对分类数据进行聚类,通过集成多个较优且具差异性的聚类结果以获得最终的聚类划分。此外,将粗糙集约简概念应用于相关子空间属性数目的确定,有效地避免了参数对聚类结果的影响。UCI数据集实验表明,新模型的性能优于其他已有模型,说明了其有效性。
In order to improve the quality of clustering ensemble for categorical data,this paper proposed a relevant random subspace-based clustering ensemble model.Based on the theory of rough sets,the model first decomposed the entire set of categorical attributes into relevant and irrelevant attribute sets.Then it used the relevant attribute set to generate the relevant subspaces randomly and obtained a final clustering solution by combing multiple good and diverse partitions resulting from the relevant subspaces.Moreover,the model employed the concept of attribute reduction in rough sets to determine the number of attributes in each relevant subspace,avoiding the effect of parameter on clustering ensemble result effectively.Empirical results on selected UCI data sets show that the proposed model achieves better and more robust clustering performance compared to some representative clustering ensemble models for categorical data,showing the effectiveness of the proposed model.
出处
《计算机应用研究》
CSCD
北大核心
2013年第4期1082-1084,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(70972062)
上海市哲学社会科学规划课题(2011BGL011)
上海市重点学科项目(S30504)
上海财经大学研究生科研创新基金资助项目
关键词
分类数据
粗糙集
属性约简
相关子空间
聚类集成
categorical data
rough sets
attribute reduction
relevant subspace
clustering ensemble