With the increasingly fierce competition among communication operators,it is more and more important to make an accurate prediction of potential off grid users.To solve the above problem,it is inevitable to consider t...With the increasingly fierce competition among communication operators,it is more and more important to make an accurate prediction of potential off grid users.To solve the above problem,it is inevitable to consider the effectiveness of learning algo rithms,the efficiency of data processing,and other factors.Therefore,in this paper,we,from the practical application point of view,propose a potential customer off grid predic tion system based on Spark,including data pre processing,feature selection,model build ing,and effective display.Furthermore,in the research of off grid system,we use the Spark parallel framework to improve the gcForest algorithm which is a novel decision tree ensemble approach.The new parallel gcForest algorithm can be used to solve practical problems,such as the off grid prediction problem.Experiments on two real world datasets demonstrate that the proposed prediction system can handle large scale data for the off grid user prediction problem and the proposed parallel gcForest can achieve satisfying per formance.展开更多
基金supported by ZTE Industry-Academia-Research Cooperationthe National Key Research and Development Program of China under Grant No.2017YFB1002104+1 种基金the National Natural Science Foundation of China under Grant Nos.U1836206,U1811461,and 61773361the Project of Youth Innovation Promotion Association CAS under Grant No.2017146
文摘With the increasingly fierce competition among communication operators,it is more and more important to make an accurate prediction of potential off grid users.To solve the above problem,it is inevitable to consider the effectiveness of learning algo rithms,the efficiency of data processing,and other factors.Therefore,in this paper,we,from the practical application point of view,propose a potential customer off grid predic tion system based on Spark,including data pre processing,feature selection,model build ing,and effective display.Furthermore,in the research of off grid system,we use the Spark parallel framework to improve the gcForest algorithm which is a novel decision tree ensemble approach.The new parallel gcForest algorithm can be used to solve practical problems,such as the off grid prediction problem.Experiments on two real world datasets demonstrate that the proposed prediction system can handle large scale data for the off grid user prediction problem and the proposed parallel gcForest can achieve satisfying per formance.