摘要
许多数据挖掘应用中涉及的预测模型庞大并且数据集复杂。这样的应用程序急需创新的算法。该算法不仅需要有效的预测精度,而且需要有效的运行于分布式计算系统中并在合理的时间内产生结果。本文重点介绍多关系数据的预测模型,首先举例说明设计这些数据的应用模型,然后描述一个基于并行同步聚类(SCOAL)的总体框架,该框架适用于分而治之的方法进行数据分析。最终将论证基于并行同步聚类的框架在应用Map-Reduce的情况下可以有效的实现并行化。
Predictive models of many data mining applications involving large and complex data sets. Such applications need to be innovative algorithm not only can effectively forecast accuracy, and the need to effectively run and produce results within a reasonable period of time in a distributed computing system. This article focuses on the prediction of multi-relational data model. First of all, we give examples of the application model of the design of these data, and then describe an overall framework based on simultaneous co-cluster; the framework applies to the divide-and-conquer method for data analysis. The final argument in the case of the application of Map-Reduce parallel synchronous clustering-based framework can achieve parallelization.
出处
《科技通报》
北大核心
2013年第10期82-84,共3页
Bulletin of Science and Technology