摘要
校园大数据分析是目前大数据研究的重要领域,针对历年积累的大量学生考勤数据,传统的数据库技术几乎无法完成全量的数据分析,串行的计算方法很难在短时间内计算出分析结果.大数据技术是解决此类问题的一种较好的方法.本文提出了一种基于Spark平台的决策树回归分析方法,可在较短时间内完成大量考勤数据的分析,生成影响学生课堂出勤率的决策树,该决策树对学生课堂出勤率的监测预警以及教学管理决策有一定的参考意义.文中详细描述了数据并行处理的流程以及采用CART算法对学生课堂出勤率进行回归分析的方法.并且比较了传统数据库技术与并行数据处理方式进行学生考勤数据分析的效率,该方法取得了较好的效果.
University big data analysis is an important field of big data research.The traditional database technology can hardly complete the full amount of data analysis for the rapid accumulation of students' attendance data.The serial algorithm is difficult to give an analysis result in a short time.Big data technology is a better way to solve this kind of problem.In this paper,a set of methods has been proposed for data analysis with decision tree algorithm based on Spark.These methods can be used to analyze a large number of attendance data in a short time and produce a decision tree about factors affecting the attendance rate.The decision tree has reference significance for the monitoring and early warning of students' attendance and the decision-making of teaching management.The issues described in this paper include the procedure of data parallel processing and the method for regression analysis of attendance rate with CART algorithm.Compared with the traditional database technology,the parallel processing method is more efficient.
作者
冯晓龙
高静
FENG Xiao-long;GAO Jing(College of Computer and Information Engineering,Inner Mongolia Agricultural University,Hohhot 010018)
出处
《内蒙古工业大学学报(自然科学版)》
2018年第2期130-135,共6页
Journal of Inner Mongolia University of Technology:Natural Science Edition
基金
内蒙古农业大学基础学科科研启动基金项目(JC2014006)
内蒙古农业大学实验教学仪器设备研制项目(2015)
关键词
大数据
决策树
回归分析
考勤分析
CART
Big data
Decision tree
Regression analysis
Attendance analysis
CART