摘要
在应届高中生生源不断下降、高等院校招生规模不断扩大、招生方式多元化不断发展、各院校之间招生竞争日趋激烈的条件下,利用海量招生异构数据,准确定位生源对象,做好前期招生宣传是各高等院校需要考虑的重要问题。结合云计算技术,利用并行化计算模型MapReduce和内存并行化计算框架Spark对高校招生历史数据进行分析,提出采用并行化随机森林预测高校招生策略模型,缩短了模型的预测时间、提高了模型的预测精度、增强了模型对大数据的处理能力。实验结果表明,并行化随机森林算法在不同数据集上的多方面性能均优于常用的决策树预测方法。
Considering the decline in the enrollment of high school students and the expansion in the scale of enrollment of colleges and universities,methods of enrollment are developing continuously,and the competition among colleges and universities is becoming fierce.Under this background,an important issue that colleges and universities need to consider is to accurately locate the source of students by using the tremendous amount of heterogeneous enrollment data and accomplish the pre-enrollment propagation.Combined with the cloud computing technology,the parallel computing model MapReduce and the memory parallel computing framework Spark are used to analyze historical enrollment data.The paralleled random forest algorithm is proposed to predict the strategy of college enrollment.This model has a shorter prediction time,improved prediction accuracy,and improved big data processing ability.The experimental result shows that the performance of the paralleled random forest algorithm in different datasets is significantly superior to the widely used decision tree prediction method.
作者
杨正理
史文
陈海霞
王长鹏
YANG Zhengli;SHI Wen;CHEN Haixia;WANG Changpeng(School of mechanical and electrical engineering,SanJiang University,Nanjing 210012,China)
出处
《智能系统学报》
CSCD
北大核心
2019年第2期323-329,共7页
CAAI Transactions on Intelligent Systems
基金
江苏省高校自然科学研究面上项目(17KJB470011)
关键词
大数据
机器学习
深度学习
学习算法
高校招生
策略预测
随机森林
云计算
big data
machine learning
deep learning
learning algorithm
college enrollment
strategy prediction
random forest
cloud computing