摘要
为解决开放式创新社区内容冗余导致高质量用户生成内容无法充分发挥其价值的问题,挖掘高质量UGC深层价值。首先采用随机过采样、SMOTE、ADASYN解决UGC数据不平衡问题,然后构建支持向量机、朴素贝叶斯、决策树、随机森林、GBDT分类模型并生成多种混合预测模型,进一步使用基于Hard-voting、Soft-voting、Stacking的采样方法和分类模型组合优化预测方法,比较选取最优的开放式创新社区UGC质量预测模型。采用随机过采样和Stacking的混合模型Accuracy、F1值和AUC分别平均提升了3.85%、28.18%、12.30%。该方法能够精准识别创新社区高质量用户生成内容,帮助企业多维度管理社区、提高创新力。
To solve the problem that high-quality UGC cannot give full play to its value due to the content redundancy of open innovation community,and excavate the deep value of high-quality UGC,random oversampling,SMOTE,ADASYN are used to solve the UGC data imbalance in this paper.Then,the classification models such as Support Vector Machine,Naive Bayes,Decision Tree,stochastic forest,GBDT are built and a variety of mixed prediction models are generated.Further,sampling methods based on Hard-voting,Soft-voting,Stacking and classification model combination optimization prediction method are used to compare and select the optimal UGC quality prediction model of open innovation community.Accuracy,F1 and AUC values are increased by 3.85%,28.18%and 12.30%on average,respectively,with random oversampling and Stacking.This method can accurately identify high-quality User-Generated Content in innovation communities,correspondingly help enterprises manage communities in a multi-dimensional manner and improve innovation ability.
作者
杨汶静
汪明艳
YANG Wenjing;WANG Mingyan(School of Management,Shanghai University of Engineering Science,Shanghai 201620,China)
出处
《智能计算机与应用》
2024年第5期179-185,共7页
Intelligent Computer and Applications
基金
国家社科基金一般项目(17BGL159)
上海市科学技术委员会软科学重点项目(22692104700)。
关键词
开放式创新社区
用户生成内容
过采样
机器学习
混合模型预测
open innovation community
User-Generated Content
oversampling
machine learning
hybrid algorithm