摘要
新闻点击量的有效预测有利于把握网络舆情、规范新闻报道、理解用户群体偏好。互联网上大量的文本信息往往蕴含着情感信息,这些信息可能会潜在影响用户,从而影响其在用户之间的传播。现有的预测研究缺乏对文本中情感因素的分析,没有挖掘情绪对传播热度的影响。针对上述问题,从基于情感的内容分析入手,通过建立代表新闻标题情感特征的情绪向量,应用基于相似性计算的协同过滤,研究热点新闻标题中情感因素与点击量之间的关系。将热点新闻点击量视为全体用户对新闻的评分,基于情绪向量的相似性计算和邻居选择,采用协同过滤算法预测热点新闻的时均点击量。在网易热点新闻数据集上的实验结果表明,基于情绪向量对热点新闻点击量的预测优于基于词频的对照方法,在选择的邻居数大于10之后预测误差平均降低3.7%,最小误差降低4.3%,揭示了热点新闻的大众关注度与其情感特征具有相关性。
Effective forecasting of news clicks is conducive to grasping online public opinion,standardizing news reports,and under⁃standing user group preferences.A large amount of text information on the Internet often contains emotional information,which may po⁃tentially affect users and thus affect its spread among users.Existing prediction research lacks the analysis of sentiment factors in the text,and does not explore the influence of sentiment on the popularity of communication.In response to the above problems,starting from content analysis based on emotions,by establishing emotion vectors representing the emotional characteristics of news headlines,and applying collaborative filtering based on similarity calculations,the relationship between emotional factors and clicks in hot news headlines is studied.Regarding hot news hits as the ratings of all users on news,based on the similarity calculation of sentiment vectors and neighbor selection,a collaborative filtering algorithm is used to predict the hourly average hits of hot news.The experimental re⁃sults on the NetEase hot news data set show that the prediction of hot news hits based on sentiment vectors is better than the comparison method based on word frequency.After the number of neighbors selected is greater than 10,the prediction error is reduced by 3.7%on average,and the minimum error is reduced by 4.3%,which reveals that the public attention of hot news is related to its emotional char⁃acteristics.
作者
艾均
毕阳阳
苏湛
AI Jun;BI Yang-yang;SU Zhan(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《软件导刊》
2021年第12期37-42,共6页
Software Guide
基金
国家自然科学青年基金项目(61803264)。
关键词
新闻预测
情绪向量
协同过滤
news prediction
emotion vector
collaborative filtering