摘要
知识约简是粗糙集理论的重要研究内容之一。经典的差别矩阵知识约简算法只能处理小数据集,而已有的任务并行的知识约简算法是假设所有数据一次性装入内存中,这显然不适合处理海量数据。为此,剖析了差别矩阵元素的特性,根据属性(集)的不可辨识性和云计算技术MapReduce设计了适合数据并行的差别矩阵,并首次提出了面向大规模数据的差别矩阵知识约简算法。实验结果表明该知识约简算法是有效可行的,且具有较好的可扩展性。
Knowledge reduction is one of the important research issues in rough set theory.Classical knowledge reduction algorithms can only deal with small datasets,while the existing parallel knowledge reduction algorithms assume all the datasets can be loaded into the main memory and only implement reduction tasks concurrently,which is infeasible for handling large-scale data.Massive data with high dimension makes attribute reduction a challenging task.To solve this problem,the characteristics of discernibility matrix cells were analyzed,and discernibility matrix for data parallel was designed in terms of the indiscernibility of the attribute(s) and MapReduce programming model.Thus,large-scale data oriented discernibility matrix knowledge reduction algorithm in cloud computing was proposed.The experimental results demonstrate that our proposed algorithm can scale well and efficiently process large-scale datasets on commodity computers.
出处
《计算机科学》
CSCD
北大核心
2011年第8期193-196,共4页
Computer Science
基金
国家自然科学基金(60970061
61075056)
上海市重点学科建设项目(B004)
江苏省属高校自然科学资金项目(09KJD520004)资助
关键词
云计算
差别矩阵
知识约简
粗糙集
Cloud computing
Discernibility matrix
Knowledge reduction
Rough set