期刊文献+

基于K-means与ChiMerge算法的数据离散化比较研究 被引量:1

Comparative research of data discretization based on K-means and ChiMerge algorithm
下载PDF
导出
摘要 许多机器学习算法要求变量为离散量,例如决策树、贝叶斯网络等。当出现变量为连续变量的情况时,需要对连续变量进行离散化处理。离散化直接影响了算法的处理效果,其对整个算法模型具有重大意义。文中提出了两种离散化方式,一种是改进的K-means(K均值聚类)离散化算法,其能确定最佳聚类数并在无监督的条件下进行离散化,一种是传统的有监督离散化算法ChiMerge,使用两种方法对数据集进行离散化处理,然后分别建立贝叶斯网络并且进行预测分析,比较二者的离散化结果。实验表明,相对于改进的K-means算法,ChiMerge的离散化效果更好,但处理效率明显低于前者。 Many machine learning algorithms require variables to be discrete,such as decision trees and Bayesian networks.When the variable is a continuous variable,the continuous variable needs to be discretized.Discretization directly affects the processing effect of the algorithm,which is of great significance to the entire algorithm model.Two discretization methods are proposed.One is an improved K-means(K-means clustering)discretization algorithm,which can determine the optimal number of clusters and perform discretization under unsupervised conditions.The other is traditional ChiMerge,which is supervised discretization algorithm.Two methods are used to discretize the data set,and then establishes a Bayesian network and performs predictive analysis to compare the discretization results of the two.Experiments show that,compared with the improved K-means algorithm,ChiMerge’s discretization effect is better,but the processing efficiency is significantly lower than the former.
作者 李浩 魏明 LI Hao;WEI Ming(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430070,China;Wuhan Fiberhome Technology Service Co.,Ltd.,Wuhan 430074,China)
出处 《信息技术》 2020年第11期121-124,131,共5页 Information Technology
关键词 离散化 K-MEANS ChiMerge 贝叶斯网络 discretization K-means ChiMerge Bayesian network
  • 相关文献

参考文献8

二级参考文献59

  • 1姜园,张朝阳,仇佩亮,周东方.用于数据挖掘的聚类算法[J].电子与信息学报,2005,27(4):655-662. 被引量:68
  • 2谢宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法[J].计算机学报,2005,28(9):1570-1574. 被引量:134
  • 3李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 4CALINSKI R,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics,1974,3(1):1 -27.
  • 5DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1979,1(2):224-227.
  • 6DUDOIT S,FRIDLYAND J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Biology,2002,3(7):1-21.
  • 7DIMITRIADOU E,DOLNICAR S,WEINGESSEL A.An examination of indexes for determining the number of cluster in binary data sets[J].Psychometrika,2002,67(1):137-160.
  • 8KAPP A V,TIBSHIRANI R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.
  • 9ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20(1):53 -65.
  • 10DEMB(E)L(E) D,KASTNER P.Fuzzy C-means method for clustering microarray data[J].Bioinformatics,2003,19(8):973-980.

共引文献1370

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部