摘要
[目的/意义]针对当下社交平台UGC质量问题,在识别用户低质量异常行为的基础上构建个体用户画像,进而形成基于用户画像的UGC质量预判模型,实现对UGC质量的预判。[方法/过程]使用孤立森林算法识别出用户异常行为,并对异常行为发生时产生的UGC进行质量属性分析,从情感质量和内容质量两方面识别低质量UGC,为产生低质量UGC的用户构建画像,对用户画像进行机器学习模型训练,得到UGC质量预判模型。把预判模型应用于测试集数据进行试验,以验证方法的可行性和有效性。[结果/结论]实验结果显示,当异常行为发生时,用户产生低质量UGC的比例显著升高,因此可以通过重点监测异常行为来识别低质量UGC。相对于对用户所有行为进行检测,该方法效率高,资源占用少。
[Purpose/significance] The main purpose of this paper is to solve the UGC quality problem of social networks and communication platforms.Individual personae are constructed on the basis of identifying users’ low quality abnormal behaviors.And the UGC quality prejudging/prediction model based on the persona is generated to implement the UGC quality prediction.[Method/process] The isolated forest algorithm is used to identify abnormal behaviors of users,based on which the quality attribute of UGC is also analyzed.The low quality UGC is identified from both emotional quality and content quality,and the personae are constructed for the low quality UGC.The machine learning model is trained to obtain the UGC quality prejudging/prediction model.The prediction model is then applied to the test set to verify the feasibility and effectiveness of the method.[Result/conclusion] The experimental results show that when abnormal behaviors occur,the proportion of users’ generating low-quality UGC increases significantly,so low-quality UGC can be identified by focusing on monitoring users’ abnormal behaviors.Compared with detecting all behaviors of users,the method has higher efficiency and less resource occupation.
出处
《情报理论与实践》
CSSCI
北大核心
2019年第10期77-83,共7页
Information Studies:Theory & Application
基金
国家社会科学基金项目“基于用户行为挖掘的UGC质量实时预判与控制机制研究”的成果,项目编号:15BTQ064