摘要
针对电信数据维度增加导致的客户欠费预测算法复杂度过高的问题,提出基于主成分分析和分类回归树的电信客户欠费预测算法。该算法将原始电信数据进行数据缺失值处理、数据冗余识别和数据结构化后,进行数据规范化建模,利用主成分分析算法对建模后的电信数据进行降维处理,将降维后的数据作为分类回归树算法的输入数据对客户是否欠费进行分类,预测客户是否将存在欠费行为。利用实际电信数据进行验证,结果表明该算法的预测错误率为4.49%,预测耗时为17.05s,与分类回归树算法相比,在能够预测客户欠费的同时,还能提高预测效率。
Aiming at the problem of the customer's arrears forecasting algorithm caused by the increase of the dimension of the telecommunication data, an algorithm for estimating the arrears of telecommunication customers based on principal component analysis and classification and re- gression tree is proposed. The data of the original telecommunication data is processed by data loss, data redundancy identification and data structure are used to normalize the data. The princi- pal component analysis algorithm is used to reduce the dimension data of the modeled telecommu- nication data. The input data as a classification and regression tree algorithm classifies the cus- tomer owes arrears. The results of the classification of the actual telecommunication data show that the prediction error rate is 4.49% and the prediction time is 17.05 seconds. Compared with the prediction efficiency of the classification and regression tree algorithm, the efficiency of fore- casting is improved while the customers of arrears are effectively identified.
出处
《西安邮电大学学报》
2017年第3期29-33,共5页
Journal of Xi’an University of Posts and Telecommunications
基金
陕西省工业科技攻关计划资助项目(2016GY-113
2015GY-013)
关键词
客户欠费预测
电信数据
主成分分析
分类回归树
customer owe fee prediction, telecommunication data, principal component analysis, classification and regression tree