摘要
用户识别是电商大数据行为挖掘的基础,本文提出了一种电商用户识别的新算法,该算法引入用户行为动机感知技术,采用初次匹配和精确识别二阶段模式来识别用户。初次匹配阶段算法利用启发式规则划分用户数据,在精确识别阶段通过实时分析用户的访问动机,依据用户行为相异数矩阵来识别用户。在Spark上的优化使算法在分布式场景中具备实时处理大规模数据的能力。实验结果表明该算法的准确率达97.89%,并具有良好的识别效率。
User identification is the basis of electronic commerce big data behavior mining.A new algorithm for electronic commerce user identification is proposed.This algorithm introduces the technology of user behavior motivation perception,and identifies the users by using the rough match and the accurate identification of two phases.User data is divided by heuristic rules in the stage of rough matching,and the user’s motivation is analyzed in real time during the precise identification phase,and the user is identified according to the dissimilarity matrix of user behaviors.Finally,the Spark computing framework is used to deal with large-scale data in distributed scenarios.Experiment results show that the accuracy of the proposed algorithm reaches 97.89%,and it has good identification efficiency.
作者
张梦菲
邱强
肖茁建
姚晓
方金云
Zhang Mengfei;Qiu Qiang;Xiao Zhuojian;Yao Xiao;Fang Jinyun(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100190)
出处
《高技术通讯》
EI
CAS
北大核心
2020年第3期259-267,共9页
Chinese High Technology Letters
基金
国家重点研发计划(2016YFB0502300,2016YFB0502302)资助项目。
关键词
用户识别
电子商务
SPARK
用户动机
分布式计算
user identification
electronic commerce
Spark
user’s intention
distributed computing