摘要
针对传统数据挖掘算法在数据量级方面的局限性,提出在粗糙集理论的基础上,采用类分布链表结构改进传统的基于属性重要性的数据离散化算法、属性约简算法以及基于启发式的值约简算法。讨论了基于动态聚类的两步离散化算法,当算法适应大数据处理之后,采用并行计算的方法提高算法的执行效率。算法测试结果表明,改进算法能有效地处理大数据量,同时并行计算解决了大数据量处理带来的效率问题。
Since the traditional data mining algorithm has the limitation in the aspect of data magnitude,on the basis of rough set theory,the class distribution list structure is used to improve the traditional data discretization algorithm based on attribute importance,attribute reduction algorithm and heuristic-based value reduction algorithm. The two-step discrete algorithm based on dynamic clustering is discussed. When the algorithm adapts to the big data processing,the parallel computing method is used to improve the execution efficiency of the algorithm. The test results of the algorithm show that the improved algorithm can effectively process the big data size. The parallel computing can solve the efficiency problem causing by big data size processing.
出处
《现代电子技术》
北大核心
2016年第7期115-119,共5页
Modern Electronics Technique
关键词
数据挖掘
粗糙集
大数据处理
并行计算
data mining
rough set
big data processing
parallel computing