期刊文献+

基于随机森林的微博互动特征分析 被引量:2

Analysis of Interactive Characteristics of Weibo Based on Random Forest
下载PDF
导出
摘要 微博凭借其开放性、低门槛已成为最常用的社交媒体平台之一,其海量数据背后蕴藏着巨大的价值亟待研究。而准确地判断微博的传播趋势,降低不良微博带来的影响已成为当前面临的主要问题。文中以新浪微博为研究对象,将随机森林算法与数据分析处理相结合,对微博的博文发布一周后的转评赞行为进行预测,将数据特征分为三类并分析了每类特征对预测结果的影响。首先,简述了决策树及随机森林算法的原理;其次,对微博数据进行分析,将提取的特征分为用户特征、时间特征和文本类特征三类;最后,通过三组对比实验验证了随机森林算法在微博互动预测上的可行性,并分析了三类特征对预测结果的影响。实验结果表明,用户特征对预测准确率的影响较大。 Weibo has become one of the most commonly used social media platforms due to its openness and low threshold,and the huge value behind its massive data needs to be studied.To accurately judge the spread trend of Weibo and reduce the impact of bad Weibo has become the main problem.Taking Sina Weibo as the research object,we combine random forest algorithm with data analysis and processing to predict the behavior of the review and praise of Weibo after one week of blog post release.We divide data features into three categories and analyze the influence of each type of features on the predicted results.Firstly,the principle of decision tree and random forest algorithm is briefly described.Secondly,the microblog data is analyzed,and the extracted features are divided into three categories:user feature,time feature and text class feature.Finally,three sets of contrast experiments are verified.The feasibility of the random forest algorithm in the interactive prediction of Weibo,and the influence of the three types of features on the prediction results are analyzed.The experiment shows that the user feature has a greater impact on the accuracy of prediction.
作者 于澍 曹琦 刘涛 YU Shu;CAO Qi;LIU Tao(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
出处 《计算机技术与发展》 2019年第10期51-54,共4页 Computer Technology and Development
基金 国家自然科学基金面上项目(51774090) 黑龙江省自然科学基金面上项目(F2015020) 黑龙江省教育科研专项引导性创新基金项目(2017YDL-12) 黑龙江省教育规划重大课题(GJ20170006)
关键词 数据挖掘 随机森林 机器学习 数据分析 决策树 data mining random forest machine learning data analysis decision tree
  • 相关文献

参考文献8

二级参考文献262

  • 1刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报(自然科学版),2008,36(S1):134-139. 被引量:8
  • 2周涛,傅忠谦,牛永伟,王达,曾燕,汪秉宏,周佩玲.复杂网络上传播动力学研究综述[J].自然科学进展,2005,15(5):513-518. 被引量:72
  • 3朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 4林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报(自然科学版),2007,46(2):199-203. 被引量:36
  • 5GuoG D, Zhang H J. Boosting for Fast Face Recognition. In: Proc of 2nd International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-Time Systems. Vancouver, Canada, 2001, 96- 100.
  • 6Abney S, Schapire R E, Singer Y. Boosting Applied to Tagging and PP Attachment. ln: Proc of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. New Brunswick, NJ, 1999, 38-45.
  • 7Rochery M, Schapire R E, Rahim M, Gupta N. BoosTexter for Text Categorization in Spoken Language Dialogue. In: Autmmtic Speech Recognition and Understanding Workshop. Madonna di Campiglio Trento, Italy, 2001. Available at http://www, cs.princeton, edu/-schapire/publist, html.
  • 8Rochery M, Schapire R, Rahim M, Gupta N, Riceardi G, Bangalore S, Alshawi H, Douglas S. Combining Prior Knowledge and Boosting for Call Class~flcat~on in Spoken Language DiaLogue. In:Proc of International Conference on Aceousties, Speech and Signal. Orlando, Florida. 2002. Available at http://www, cs/princetonedu/-schapire/whatsnew. html.
  • 9Schapire R E, Singer Y. BcosTexter: A Bcosting-Based System for Text Categorization. Machine Learning, 2000, 39(2- 3): 135- 168.
  • 10Schapire R E, Rochery M, Rahim M, Gupta N. Incorporating Prior Knowledge into Boosting. In: Proc of the 19th International Conference on Machine Learning. Sydney, 2002, 538 - 545.

共引文献1113

同被引文献25

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部