摘要
针对传统朴素贝叶斯分类算法处理多维连续型数据时准确率较低的问题,提出基于属性关联的改进算法。通过高斯分割对属性类别不同的多维连续型数据集进行离散化处理,并使用拉普拉斯校准、属性关联和属性加权方法改进朴素贝叶斯分类过程。实验结果表明,与基于拉普拉斯校准或属性加权的改进算法相比,该算法能够提高分类准确率,且提升幅度在一定范围内随着属性数量的增加而增加,适用于多维连续型数据的分类。
Aiming at the problem that the accuracy of the multi-dimensional continuous data is too low for traditional naive Bayesian classification algorithm,an improved classification algorithm based on attribute association is proposed.Directed against the multidimensional continuous data set with different attribute classes,it discretizes the data set by Gaussian segmentation,which is improved by using Laplace calibration,attribute association and weighted attribute.Experimental results show that,compared with improved algorithms by Laplace calibration or attribute weighting,the proposed algorithm can improve the accuracy of classification results,and its amplitude increase is increased with the increase of the number of attributes in a certain range,which is suitable for the classification of multidimensional continuous data.
作者
宁可
孙同晶
赵浩强
NING Ke 1,SUN Tongjing 1,ZHAO Haoqiang 2(1.College of Automation,Hangzhou Dianzi University,Hangzhou 310018,China;2.Zhejiang Electronic Information Products Testing Institute,Hangzhou 310007,Chin)
出处
《计算机工程》
CAS
CSCD
北大核心
2018年第6期18-23,共6页
Computer Engineering
基金
浙江省信息安全重点实验室基金(KYZ066816004)
关键词
连续型数据
数据分类
关联规则
朴素贝叶斯分类算法
属性加权
continuous data
data classification
association rule
naive Bayesian classification algorithm
attribute weighting