摘要
代价敏感属性选择是数据挖掘的一个重要研究领域,其目的在于通过权衡测试代价和误分类代价,获得总代价最小的属性子集。针对经典回溯算法运行时间较长的缺点,结合分治思想,提出了一种改进的回溯算法。改进算法引入了两个相关参数,根据数据集规模自适应调整参数,并按参数大小拆分数据集,降低问题规模,以提高经典回溯算法的执行效率。针对较大规模数据集的实验结果表明,与经典的回溯算法相比,改进算法在保证效果的同时至少提高20%的运算效率;与启发式算法相比,改进算法在保证效率的同时取得了具有更小总代价的属性集合,可应用于实际问题。
Cost-sensitive feature selection is an important research field in the process of data mining. It aims at obtaining an attribute subset of the lowest total cost, through balancing test cost and misclassification cost. According to the shortcoming of the classical backtracking algorithm with longer running time, combining divide and conquer thought,this paper proposes an improved backtracking algorithm. Introducing two related parameters, this algorithm computes adaptively parameters according to the dataset scale, and splits the dataset with these parameters. It can enhance the efficiency of the classical backtracking algorithm by reducing the problem size. The experiments on the datasets with large scale show that this improved algorithm is effective and meets the need of practical problems. At the same time guaranteeing the effect, this improved algorithm promotes the efficiency of 20% at least than the classical backtracking algorithm. Compared with heuristic algorithm, this improved algorithm obtains an attribute set with a smaller total cost and ensures the efficiency.
出处
《计算机科学与探索》
CSCD
北大核心
2016年第10期1451-1458,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金
Nos.61379049
61379089
61170128
福建省科技计划重点项目
No.2012H0043
漳州市自然科学基金
No.ZZ2016J35~~
关键词
粗糙集
粒计算
代价敏感
属性选择
自适应分治
rough sets
granular computing
cost-sensitive
feature selection
adaptive divide and conquer