摘要
互联网上的用户生成内容UGC(User Generated Content)中蕴含的用户主观观点信息对分析用户行为、用户需求等工作有着重要的价值。设计一套基于自然语言理解的互联网UGC文本主观观点分析系统WSAM,该系统能挖掘出用户主观观点所蕴含的关注对象和主观成分。分析了互联网UGC现象和生成原因,总结出UGC中用户主观观点中的四种主要类型。挖掘用户主观观点过程中,将用户主观观点的挖掘转化为句子中主观观点关注对象的识别和主观成分的判断。算法结合基于词语类、结构类等相关特征,采用最大熵分类器挖掘用户主观观点。实验验证,WSAM系统所采用的算法性能较好,且还能够灵活扩充出情感分析(Opin-ion Mining)等相关应用,同样也能达到较好的结果。
The information about subjective attitude of users contained in UGC(User Generated Content) of internet is much valuable for user behaviour analysis and user demand analysis.In this paper we design an internet text UGC subjective attitude analysing system,WSAM,based on nature language comprehension.This system can mine the objects attended to and the subjective components,all contained in subjective attitude of users.The UGC phenomena in internet and the reason they generated are analysed in the paper,and four main types of subjective attitude of users in text UGC are concluded.During the process of mining subjective attitude of users,we convert the procedure of subjective attitude mining into the procedures of recognising the object attended to by subjective attitude in sentence and determining the subjective components.The algorithm uses the maximum entropy classifier to mine subjective attitude of users in combination with relative features in regard to lexical and structural classes.Experiments validate that the algorithm adopted by WSAM system is good in performance,and the system can be extended easily to related applications such as opinion mining with preferred good results as well.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第5期90-94,共5页
Computer Applications and Software
基金
上海市博士后项目资助(10R21421900)
关键词
用户生成内容
UGC
自然语言处理
情感分析
User generated content UGC Nature language processing Opinion mining