摘要
本文探讨如何从用户购买数据中学习出高质量词嵌入,以让模型据此实现高效的人口属性预测任务。首先分析购买数据并对其进行编码,并在此基础上构建嵌入向量生成模型,之后用样本数据训练该模型,然后用神经网络程序实现该模型,最后通过实验验证该模型的可行性和高效性。提出的模型不仅能将具有大量模态的分类特征数据转换为低维的高质量词嵌入,而且能让模型据此实现高效的人口属性预测,此外具有较广泛的通用性。提出的方法不仅可扩展到大型数据集,而且能适用于不同领域的数据集。学习到的高质量词嵌入有助于大量下游非语言任务的开展,例如人口属性预测、情感分析、社区检测或社交网络上的概率推理等,从而为新型推荐引擎提供支持。
This paper discusses how to learn high-quality word embedding from user purchase data,so that the model can achieve efficient demographic attribute prediction tasks accordingly.First we analyze and encode the purchase data,and build an embedded vector generation model on this basis,then train the model with sample data,and then implement the model with a neural network program,and finally verify the feasibility and efficiency of the model through experiments.The proposed model can convert a large number of modal classification feature data into low-dimensional high-quality word embedding,based on which efficient demographic attribute this prediction can be achieved,thus this model has wider versatility.The proposed method is not only scalable to large data sets,but also applicable to data sets in different fields.The high-quality word embedding learned helps to carry out a large number of downstream non-verbal tasks,such as demographic attribute prediction,sentiment analysis,community detection,or probabilistic reasoning on social networks,etc.,to support the new recommendation engine.
作者
高广尚
GAO Guang-shang(Business School,Guilin University of Technology,Guilin 541004,China;Research Center for Modern Enterprise Management,Guilin University of Technology,Guilin 541004,China)
出处
《系统工程》
北大核心
2021年第1期148-158,共11页
Systems Engineering
基金
国家自然科学基金资助项目(71761008)
广西科技计划项目(桂科AD19245122)
桂林理工大学科研启动基金资助项目(GUTQDJJ2016020)
广西高校人文社会科学重点研究基地基金资助项目(19YB001)。
关键词
人口统计属性预测
词嵌入
神经网络
购买数据
Demographic Attribute Prediction
Word Embedding
Neural Network
Purchase Data