摘要
Hadoop是大数据挖掘的主流平台,在该平台上可以进行大数据的挖掘。数据挖掘的规模和速度是我们需要考虑的问题。Spark框架是一个优秀的框架,它集机器学习,图计算和在线学习为一身,是简洁、强大、高效的。该文先讨论了Spark的组成,接着讨论Spark的任务调度方式,最后讨论了Spark的环境及测试。
Hadoop is the main platform of big data mining on which you can mine big data.The scale and speed of data mining is an issue we need to consider. Spark framework is an excellent framework, which combines machine learning, graphs computing and online processing in one framework, which is a simple, powerful and efficient. This paper first discusses the Spark composition, followed by a discussion Spark task scheduling, and finally discuss the environment and test the Spark.
出处
《电脑知识与技术(过刊)》
2014年第12X期8407-8408,共2页
Computer Knowledge and Technology