摘要
本文采用“数据世界”官网的IMDB电影数据集,该数据集涵盖66个国家、跨越100多年的5000多部电影,含有影片时长、导演、票房、成本等28个变量,其中“imdb_score”为响应变量,其它为预测性变量。本文针对上述数据集通过数据分析处理,采用数据挖掘技术中的随机森林算法以及BP神经网络进行评分预测和性能检验。最后,基于最优的随机森林预测模型对15部新电影评分进行预测,预测结果良好。在IMDB电影排行榜中,得分越高(满分10分)则代表影片越精彩,值得观看。本文研究目的是预测精彩有意义的电影,节省时间满足大众观影者的观影需求,同时为电影推荐系统提供可行性建议。
This paper uses the IMDB movie data set of the official website of"data world",which covers more than 5000 movies from 66 countries and over 100 years,including 28 variables such as movie duration,director,box office,cost,etc.Among them,"imdb_score"is the response variable,others are predictive variables.In this paper,through data analysis and processing,the ran⁃dom forest algorithm and BP neural network in data mining technology are used for score prediction and performance test.Finally,based on the optimal random forest prediction model,15 new movies are predicted,and the prediction results are good.In the IMDB movie rankings,the higher the score(Full mark 10)means that the movie is more wonderful and worth watching.The purpose of this paper is to predict the wonderful and meaningful movies,save time to meet the audience's needs,and provide feasible suggestions for the movie recommendation system.
作者
谭家柱
Tan Jiazhu(School of Mathematics and Statistics,Guangxi Normal University,Guilin 541006)
出处
《现代计算机》
2021年第30期24-31,共8页
Modern Computer