期刊文献+

基于正态分布特征的连续属性无监督离散化方法研究 被引量:2

The Unsupervised Discretization Method of Continuous Attributes Study: Based on Normal Distribution Characteristics
下载PDF
导出
摘要 商业智能分析诸多算法是基于离散化数据的,但商业分析的中数据类型不一,将连续属性离散化是商业智能分析中数据预处理中非常重要的内容之一。通过对连续属性的分布特征和不同类别在同一属性下的分布特点分析,提出基于正态分布特征的连续属性无监督离散化方法,并研究了经该离散化方法对连续属性数据预处理后测试数据分类精度与断点个数设置之间的关系,确定统计意义上较为合理的断点个数,实现对连续数据的离散化处理。数值对比实验结果表明:本文所提出的离散化方法在一定程度上可以提高数据集分类精度。 The discrete data is used to the vast majority of research methods of data mining.So it is necessary to discretize the continuous data as a part work of data preprocessing.This paper analy sis a new unsupervised discretization of continuous attributes based on normal distribution characteristics through the normal distribution characteristics and the distribution of different categories in the same attribution. After that,we study the relationship between the classify accuracy of the testing data and the setting number of the cut-points,and we find the logical number of the cut-points.F inally,the experiments show that the method can improve the classify accuracy of the testing datasets.
出处 《科学与管理》 2009年第6X期5-8,共4页 Science and Management
关键词 正态分布 连续属性 离散化 数据挖掘 The Normal Distribution Continuous Attribute Discretization Method Data Mining
  • 相关文献

参考文献3

  • 1李刚,童頫.基于混合概率模型的无监督离散化算法[J].计算机学报,2002,25(2):158-164. 被引量:16
  • 2Marc Boullé. MODL: A Bayes optimal discretization method for continuous attributes[J] 2006,Machine Learning(1):131~165
  • 3Stephen D. Bay. Multivariate Discretization for Set Mining[J] 2001,Knowledge and Information Systems(4):491~512

二级参考文献14

  • 1[1]Catlett J. On changing continuous attributes into ordered discreteattributes. In: Proc European Working Session on Learning (EWSL91). LNAI-482, Porto,Portugal, 1991. 164-178
  • 2[2]Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretizationof continuous features. In: Proc the 12th International Conference, Morgan KaufmannPublishers, 1995.194-202
  • 3[3]Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann,1993
  • 4[4]Fayyad U, Irani K. Multi-interval discretizaton of continuous-valuedattributes for classification learning. In: Proc the 13th International JointConference on Artificial Intelligence, San Mateo, CA. Morgan Kaufmann Publishers,1993. 1022-1027
  • 5[5]Li G, Tong F. WILD: Weighted information-loss discretization algorithm forordinal attributes. In: Proc Conference on Intelligent Information Processing, the16th IFIP World Computer Congress 2000, Beijing, China, 2000.254-527
  • 6[6]Quinlan J R. Improved use of continuous attributes in C4.5. Journal ofArtificial Intelligence Research, 1996,4(1):77-90
  • 7[7]Wong A K C, Chiu D K Y. Synthesizing statistical knowledge from incompletemixed-mode data. IEEE Trans Pattern Analysis and Machine Intelligence, 1987,PAMI-9(6):796-805
  • 8[8]Banfield J D, Raftery A E. Model based Gaussian and non-Gaussian clustering.Biometrics, 1993,49(3):803-821
  • 9[9]Mackay D J C. Information Theory, Inference and Learning Algorithms.Cambridge: Cambridge University Press, 2000
  • 10[10]Dempster A P, Laird N M, Rubin D B. Maximum likelihood for incomplete data viathe EM algorithm. Journal of the Royal Statistical Society, Series B, 1977,39(1):1-38

共引文献15

同被引文献11

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部