摘要
针对大数据环境下画像系统的实时性和准确性问题,提出一种基于Structured Streaming的实时画像系统设计与实现。利用canal组件对用户行为日志系统实现增量订阅,kafka消息中间件完成实时数据流接入,应用Structured Streaming实时计算框架对用户的实时数据进行分析处理,刻画用户的实时兴趣。通过改进的TF-IDF算法改善文本画像系统的准确性与可靠性,并借助Structured Streaming与静态数据良好的交互性减轻实时计算压力,提高系统响应速度。
Aiming at the real-time and accuracy of profile system in big data environment,a real-time profile system design and implementation based on Structured Streaming is proposed in this paper.The canal component is used to implement incremental subscription to the user behavior log system,kafka message middleware completes real-time data stream access,and the Structured Streaming real-time processing framework analyzes and processes the user’s real-time data to describe the user’s real-time interest.Through the improved TF-IDF algorithm,the accuracy and reliability of the text profile system are improved,the good interaction between Structured Streaming and static data is used to reduce the pressure of real-time computing and improve the system response speed.
出处
《工业控制计算机》
2022年第11期114-116,118,共4页
Industrial Control Computer
基金
国家重点研发计划(2021YFB2900800)
上海市科委项目(20511102400)、(20ZR1420900)。