A user profile contains information about a user. A substantial effort has been made so as to understand users’ behavior through analyzing their profile data. Online social networks provide an enormous amount of such...A user profile contains information about a user. A substantial effort has been made so as to understand users’ behavior through analyzing their profile data. Online social networks provide an enormous amount of such information for researchers. Sina Weibo, a Twitter-like microblogging platform, has achieved a great success in China although studies on it are still in an initial state. This paper aims to explore the relationships among different profile attributes in Sina Weibo. We use the techniques of association rule mining to identify the dependency among the attributes and we found that if a user’s posts are welcomed, he or she is more likely to have a large number of followers. Our results demonstrate how the relationships among the profile attributes are affected by a user’s verified type. We also put some efforts on data transformation and analyze the influence of the statistical properties of the data distribution on data discretization.展开更多
结合台风属性数据和多标签分类方法,以BERT-BiLSTM(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory)为分类模型,提出基于微博文本与深度学习的台风灾情识别方法,对2010—2019年登陆广...结合台风属性数据和多标签分类方法,以BERT-BiLSTM(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory)为分类模型,提出基于微博文本与深度学习的台风灾情识别方法,对2010—2019年登陆广东省的强台风/超强台风灾情进行识别,在粗分类获取台风灾情相关微博文本的基础上,进一步细分类为交通影响、社会影响、电力影响、林业影响和内涝积水等5类灾情。结果表明:1)提出的台风灾情识别方法粗分类和细分类精度分别达到0.907和0.814;2)强台风/超强台风的灾情占比受台风强度、路径和受灾地区发展水平等因素影响而存在差异;3)台风登陆前,灾情主要为台风预防措施导致的交通影响和社会影响。台风登陆后,灾情表现出单峰和双峰特征,反映台风灾情的变化趋势和特点。展开更多
微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。...微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。采用Word2vec中的连续词袋模型(continuous bag of words,CBOW),利用语境的上下文结构及语义关系将每个词语映射为向量空间,增强词向量之间的稠密度;采用长短时记忆神经网络模型实现对文本上下文序列的线性抓取,最后输出分类预测的结果。实验结果的准确率可达95.9%,通过对照实验得到情感词典、RNN、SVM三种模型的准确率分别为52.3%、92.7%、85.7%,对比发现基于LSTM+Word2vec的深度学习情感分析模型的准确率更高,具有一定的鲁棒性和泛化性,对用户个性化推送和网络舆情监控具有重要意义。展开更多
文摘A user profile contains information about a user. A substantial effort has been made so as to understand users’ behavior through analyzing their profile data. Online social networks provide an enormous amount of such information for researchers. Sina Weibo, a Twitter-like microblogging platform, has achieved a great success in China although studies on it are still in an initial state. This paper aims to explore the relationships among different profile attributes in Sina Weibo. We use the techniques of association rule mining to identify the dependency among the attributes and we found that if a user’s posts are welcomed, he or she is more likely to have a large number of followers. Our results demonstrate how the relationships among the profile attributes are affected by a user’s verified type. We also put some efforts on data transformation and analyze the influence of the statistical properties of the data distribution on data discretization.
文摘结合台风属性数据和多标签分类方法,以BERT-BiLSTM(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory)为分类模型,提出基于微博文本与深度学习的台风灾情识别方法,对2010—2019年登陆广东省的强台风/超强台风灾情进行识别,在粗分类获取台风灾情相关微博文本的基础上,进一步细分类为交通影响、社会影响、电力影响、林业影响和内涝积水等5类灾情。结果表明:1)提出的台风灾情识别方法粗分类和细分类精度分别达到0.907和0.814;2)强台风/超强台风的灾情占比受台风强度、路径和受灾地区发展水平等因素影响而存在差异;3)台风登陆前,灾情主要为台风预防措施导致的交通影响和社会影响。台风登陆后,灾情表现出单峰和双峰特征,反映台风灾情的变化趋势和特点。
文摘微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。采用Word2vec中的连续词袋模型(continuous bag of words,CBOW),利用语境的上下文结构及语义关系将每个词语映射为向量空间,增强词向量之间的稠密度;采用长短时记忆神经网络模型实现对文本上下文序列的线性抓取,最后输出分类预测的结果。实验结果的准确率可达95.9%,通过对照实验得到情感词典、RNN、SVM三种模型的准确率分别为52.3%、92.7%、85.7%,对比发现基于LSTM+Word2vec的深度学习情感分析模型的准确率更高,具有一定的鲁棒性和泛化性,对用户个性化推送和网络舆情监控具有重要意义。