摘要
连续属性离散化是数据分析预处理中的一项重要内容,针对有监督学习,提出了一种基于密度分布函数聚类的连续属性离散化方法。该方法利用了粗糙集中决策表的一致性水平的概念,通过计算基于聚类划分后决策表一致性水平的反馈信息,动态地调整聚类参数影响因子,直到决策表的一致性水平达到原始水平为止。由于同时考虑所有属性的离散化效果,可使离散化的结果更为合理。为了验证该方法的可行性,文中利用实际数据进行了试验。
Discretization of continuous attributes is one of the important steps in preprocessing of data analysis. In this paper, a new method of supervised discretization of continuous attributes based on clustering is introduced. This method makes use of the concept of the level of consistency of decision table in Rough Sets. By computing the level of consistency of the produced decision table, the coefficient of clustering, influence factor, is adjusted dynamically. And this procedure couldn抰 stop until the level of consistency of the decision table reaches its original level. Because the discretization of all continuous attributes is done simultaneously, the result should be reasonable. Experiments show that this method is feasible.
出处
《系统仿真学报》
CAS
CSCD
2003年第6期804-806,813,共4页
Journal of System Simulation
基金
国家自然科学基金(69975024)