摘要
为了科学使用真实世界数据,探索适用于日益常见的混合型数据的聚类方法,文章分析和比较了两种典型的混合型数据聚类方法K-prototypes与ClustMD,改进了聚类方法关键参数选择方法,并提出聚类稳定性指标。结果表明,两种聚类方法均具有很高的有效性和稳定性,各有优缺点。当数据相关性强、数据缺失严重或非连续变量较多时,建议使用K-prototypes。
In order to scientifically use real world data,this paper explores the clustering methods applicable to the increasingly common mixed medical data. The paper analyzes and compares the two typical clustering methods:K-prototypes and ClustMD,improves the key parameter selection method,and also proposes the clustering stability index. Cases analysis results indicate that the two methods are highly effective and stable,each with advantages and disadvantages. When data correlation is strong,data missing is serious or there are relatively more non-continuous variables,K-prototypes is recommended for hybrid data.
作者
刘超
姚清华
乐然
Liu Chao;Yao Qinghua;Le Ran(Mathematics and Systems Science Institute,Beijing University of Aeronautics and Astronautics,Beijing 100083,China;LMIB of the Ministry of Education,Beijing University of Aeronautics and Astronautics,Beijing 100083,China;Academy for Advanced Interdisciplinary Studies,Peking University,Beijing 100871,China)
出处
《统计与决策》
CSSCI
北大核心
2019年第11期64-67,共4页
Statistics & Decision