摘要
对分类算法中需要解决的关键问题进行了分析;综述了不同分类算法的思想和特性,决策树分类算法能够很好地处理噪声数据,但只对规模较小训练样本集有效;贝叶斯分类算法精度高、速度快,错误率低,但分类不够准确;传统的基于关联规则算法分类准确率高,但容易受硬件内存的制约;支持向量机算法分类准确率高、复杂度低,但速度慢。针对各种分类算法的缺陷,结合其优点,论述了当前一些速度更快、准确率更高、能实现更好分类效果的新算法,如多决策树综合技术、基于先验信息和信息增益的混合分类算法,基于粗糙集和遗传算法的神经网络分类算法等;对数据挖掘分类算法作了展望,提出今后的研究重点。
In this paper, we analyzed some key problems that must be solved in classification. Then, the idea and characteristic of main kinds of classification algorithms are reviewed. Decision tree algorithm can handle noise data well but is only effective to small datasets. Bayesian has the merits of high accuracy, fast speed, low mistake rate and demerits of low accuracy. Classification based on association rule has advantages of high accuracy but is limited to random access memory. Support vector machine has the merits of high accuracy, low complexity but shows bad time complexity. According to the advantages and disadvantages of the well-known algorithms, some recent proposed classification algorithms which achieve better performance are addressed, such as multi-decision fusion technology, the hybrid classification algorithm based on Bayesian and information gain, and neural network classification algorithm based on rough set and genetic algorithm etc. Finally, research emphasis in the future is discussed.
出处
《重庆师范大学学报(自然科学版)》
CAS
2011年第4期44-47,共4页
Journal of Chongqing Normal University:Natural Science
关键词
数据挖掘
分类
综述
data mining
classification
review