摘要
提出了一种融合无监督和监督两种学习策略生成多分类决策树的方法 .它首先利用无监督聚类方法能够发现待分类样本之间的内在联系和规律的特点 ,确定出最为符合多类样本分布特征的决策树的树型 ,继而利用监督学习支持向量机的方法对样本进行准确的分类 .通过采用核函数和不对称的 L agrangian系数限制条件 ,支持向量机很好的解决了样本特征空间上的线性不可分性和决策树型确定过程中出现的训练样本不对称性的影响 .该方法具有较高的计算效率和准确性 。
In this paper, a new method which combines unsupervised and supervised learning strategy is put forward to construct the multi classification decision tree. It firstly uses the unsupervised clustering to determine the structure of the multi classification decision tree, whose each node has a binary branch. The unsupervised clustering is able to find out the relationship between the multi classes, therefore the decision tree's structure determined by it is the best one that fits to the distribution of multi classes in feature space. Then, a supervised learning method, i.e. support vector machine, is used to classify the two groups of samples of each node of the decision tree. Most cases the multi classes cannot be classified by a linear hyperplane, kernel functions are therefore introduced into to solve it. Simultaneously, unsymmetrical constrains of Lagrangian coefficients are set to overcome the negative influences of unbalanced train samples. These efforts guarantee the efficiency and accuracy of the multi classification decision tree. Satisfying results were obtained in experiment.
出处
《小型微型计算机系统》
CSCD
北大核心
2004年第4期555-559,共5页
Journal of Chinese Computer Systems
关键词
多分类决策树
无监督聚类
支持向量机
multi classification decision tree
unsupervised cluster support vector machine