摘要
互联网大数据具有典型的高维、高阶,以及非线性特征,现有点击率数据预测方法往往难以有效处理数据特征的复杂耦合、以及稀疏与类别的不均衡问题,为解决上述问题,提出了一种高阶深度分解机预测方法。在高阶分解机设计中,考虑到点击率的二分类特性,采用函数把输入数据映射至输出结果的二值类上,并利用损失函数求偏导对模型变量进行梯度更新。为了优化模型的复杂度及其多阶性能,对映射二次项采取转换,并推广至三阶映射模型。最后,设计了单层与多层构成的深度网络,根据的对称性与偏置训练样本集,利用弥补的无监督学习。并在梯度计算时引入对比散度用以优化网络训练速度,在神经网络层采用机制用以避免网络发生过拟合。仿真结果表明,高阶深度分解机预测方法具有良好的和指标性能,能够有效提高高阶点击率大数据的预测准确度与预测速度。
Internet big data has typical high-dimensional,high-order,and nonlinear characteristics,and it is often difficult for the existing hit rate data prediction methods to deal with the complex coupling of data features and the imbalance between sparsity and category.Therefore,a prediction method of high order depth decomposition machine is proposed.In the design of high-order decomposer,the binary classification of click rate was considered,the sigmoid function was used to map the input data to the binary class of the output results,and the loss function was used to calculate the partial derivative to update the model variables.In order to optimize the complexity and multi-order performance of the model,the quadratic term of the mapping was transformed and extended to the third-order mapping model.Finally,a deep network composed of single-layer BP and multi-layer RBM was designed.According to the symmetry and bias of RBM,BP was used to compensate the unsupervised learning of RBM.The contrast divergence was introduced into the gradient calculation to optimize the network training speed,and dropout mechanism was used in BP neural network layer to avoid over fitting.The simulation results show that,high order depth decomposition machine prediction method has good performances of logloss and AUC index,which can effectively improve the prediction accuracy and prediction speed of high-order click through rate big data.
作者
张换梅
董云云
ZHANG Huan-mei;DONG Yun-yun(Department of Computer Science and Technology,Jinzhong University,Jinzhong Shanxi 030619,China;School of Information and Computer Science,Taiyuan University of Technology,Taiyuan Shanxi 030024 China)
出处
《计算机仿真》
北大核心
2021年第3期456-460,共5页
Computer Simulation
基金
晋中学院“1331工程”创客团队建设计划资助项目(jzxyck-td2017018)
晋中学院“1331工程”创客团队建设计划资助项目(jzxy-cktd2019039)。
关键词
点击率大数据
高阶分解机模型
梯度计算
损失函数
深度网络学习
Hit rate big data
High order decomposer model
Gradient calculation
Loss function
Deep network learning