摘要
针对分类变量相似度定义存在的不足,提出一种新的相似度定义.利用新的相似度定义,将数据集抽象为无向图,将聚类过程转化为求无向图连通分量的过程,进而提出一种基于连通分量的分类变量聚类算法.为了定量地分析该算法的聚类效果,针对类别归属已知的数据集,提出一种新的聚类结果评价指标.实验结果表明,所提出的算法具有较高的聚类精度和聚类效率.
For the insufficient similarity concepts for categorical variables, a new more reasonable concept is proposed. Firstly, a data set is organized into an undirected graph by the new definition. The clustering process is converted into the problem of determining connected components in the undirected graph. Then a novel clustering algorithm for categorical variables based on connected components is proposed. In order to analyze the clustering results quantitatively, a new index is proposed for the known labels. Finally, the experimental results show that the proposed algorithm has a higher clustering precision and faster execution speed compared with several existing ones.
出处
《控制与决策》
EI
CSCD
北大核心
2015年第1期39-45,共7页
Control and Decision
基金
国家自然科学基金项目(61402363
61272284)
陕西省工业攻关项目(2014K05-49)
陕西省自然科学基础研究计划项目(2014JQ8361)
西安市碑林区科技计划项目(GX1405)
西安市科学计划项目(CXY1339(5))
校特色研究计划项目(116-211302)
关键词
聚类
分类变量
相似度
连通分量
聚类精度
clustering
categorical variables
similarity
connected components
clustering precision