摘要
运营商自身具有大数据的天然优势,为了挖掘海量数据背后隐藏的潜在价值,提出了一种整合分布式消息系统Kafka、分布式流式处理框架Spark、分布式文件系统Hadoop的大数据处理系统,利用K-means聚类算法建立校园学生用户话费消费分类模型。实验结果表明该方法能更加准确地划分用户消费类型,提高运营商的竞争力,证明了本系统的商业价值。
Operatoritself has the natural advantage of big data,in order to discover the potential values from the huge amounts of data.This paper proposed a big data processing framework integrating distributed message system Kafka,distributed streaming processing framework Spark Streaming and distributed file system Hadoop,using k-means clustering algorithm to construct campus student user consumption classification model based on consumption data.The experimental results show that this method can more accurately classify user consumption types,improve the competitiveness of operators,prove the commercial value of this system.
出处
《电子测试》
2016年第Z1期51-54,56,共5页
Electronic Test