期刊文献+

基于统计相关系数的数据离散化方法 被引量:5

Data discretization method based on statistical correlation coefficient
下载PDF
导出
摘要 很多数据挖掘方法只能处理离散值的属性,因此,连续属性必须进行离散化。提出一种统计相关系数的数据离散化方法,基于统计相关理论有效地捕获了类属性间的相互依赖,选取最佳断点。此外,将变精度粗糙集(VPRS)模型纳入离散化中,有效地控制数据的信息丢失。将所提方法在乳腺癌症诊断以及其他领域数据上进行了应用,实验结果表明,该方法显著地提高了See5决策树的分类学习精度。 Most data mining and induction learning methods can only deal with discrete attributes;therefore,discretization of continuous attributes is necessary.The author proposed a data discretization method based on statistical correlation coefficient.The method captured the interdependence between attributes and target class with the aim to select optimal cut points based on statistical correlation theory.In addition,the author incorporated Variable Precision Rough Set(VPRS) model to effectively control information loss.The proposed method was applied to breast tumor diagnosis and data of other fields.The experimental results show that this method significantly enhances the accuracy of classification of See5.
作者 解亚萍
出处 《计算机应用》 CSCD 北大核心 2011年第5期1409-1412,共4页 journal of Computer Applications
关键词 离散化 数据挖掘 类属性相互依赖 变精度粗糙集 决策树 discretization data mining Class-Attribute Interdependence(CAI) Variable Precision Rough Set(VPRS) decision tree
  • 相关文献

参考文献2

二级参考文献16

  • 1Nguyen S.H., Nguyen H.S.. Some efficient algorithms for rough set methods. In: Proceedings of the Conference of Information Processing and Management of Uncertainty in Knowledge-Based Systems, Granada, Spain, 1996, 1451~1456.
  • 2Susmaga R.. Analyzing discretizations of continuous attributes given a monotonic discrimination function. Intelligent Data Analysis, 1997, 1(4): 157~179.
  • 3Dai Jian-Hua, Li Yuan-Xiang. Study on discretization based on rough set theory. In: Proceedings of the first International Conference on Machine Learning and Cybernetics, Beijing, 2002, 1371~1373.
  • 4Chen Cai-Yun, Li Zhi-Guo, Qiao Sheng-Yong, Wen Shuo-Pin. Study on discretization in rough set based on genetic algorithm. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi′an, 2003, 1430~1434.
  • 5Huang Jin-Jie, Li Shi-Yong. A GA-based approach to rough data model. In: Proceedings of the 5th World Congress on Intelligent Control and Automation, Hangzhou. 2004, 1880~1884.
  • 6Roy A., Pal S.K.. Fuzzy discretization of feature space for a rough set classier. Pattern Recognition Letters, 2003, 24(6): 895~902.
  • 7Wang Li-Hong, Zhang Shu-Cui, Fan Hui, Wu Geng-Feng. The information granulation in discretization. In: Proceedings of the Second International Conference on Machine Learning and Cybernedcs, Xi′an, 2003, 2620~2623.
  • 8Li Meng-Xin, Wu Cheng-Dong, Han Zhong-Hua, Yue Yong. A hierarchical clustering method for attribute discretization in rough set theory. In: Proceedings of the third International Conference on Machine Learning and Cybernetics, Shanghai, 2004, 3650~3654.
  • 9Shen L., Tay E.H.. A discretization method for rough sets theory. Intelligent Data Analysis, 2001, 5(5): 431~438.
  • 10Tay E.H., Shen L.. A modified Chi2 algorithm for discretization. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3): 666~670.

共引文献133

同被引文献43

  • 1谢宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法[J].计算机学报,2005,28(9):1570-1574. 被引量:134
  • 2Doucherty J, Kohavi R, Sahami M. Supervised and un- supervised discretization of continuous feature[C]// Proceedings of the 12th International Conference of Machine Learning. San Francisco. Morgan Kaufmann, 1995 : 194-202.
  • 3Fayyad U, Irani K. Multi-interval discretization of con- tinuous- valued attributes for classification learning[C]//Proceedings of the 13th International Joint Con- ference on Artificial Intelligence. San Mateo, CA: Mor- gan Kaufmann, 1993 1022-1027.
  • 4Ching J Y,Wond A K C,Chan K C C. Class-dependent discretization for inductive learning from continuous and mixed-mode dat[J] IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 1995, 17 (7): 641-651.
  • 5Kurgan L A, Cios K J. Cairn discretization algorithm [J]. IEEE Transactions on Knowledge and Data Engi- neering, 2004,16(2) : 145-153.
  • 6Tai C J ,Lee C I,Yang W P. A discretization algorithm based on class-attributes contingency coefficien[J]. In- fommtion Sciences. 2008,178(3), 714-731.
  • 7Hettich S, Bay S D. The UCI KDD archive [-DB/OL]. [-2012-03-10]. http..//kdd, ics. uci. edu/, 1999.
  • 8叶明全,胡学钢.一种基于灰色关联度的决策树改进算法[J].计算机工程与应用,2007,43(32):171-173. 被引量:13
  • 9孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1065
  • 10韩家炜,堪博.数据挖掘概念与技术[M].2版.机械工业出版社,2007:188-189.

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部