摘要
针对由微博短文本特征规模大、自身特征较少等特点导致的数据稀疏性,提出一种基于特征簇的微博情感分类方法.提出的分类方法以大规模语料库为基础,利用word2vec模型学习词语之间潜在的语义关联,将单个词语表示成多维向量的形式;结合情感词典,提取出微博文本的情感特征集,在基于词向量计算词语相似度方法的基础上,将情感特征合并为特征簇,以此构造低维的文本向量;最后利用机器学习算法,构建情感分类器,实现微博短文本的情感分类.实验结果表明,本文提出的方法对情感特征的降维是可行和有效的,并且取得很好的情感分类效果.
A method of sentimental classification of Microblog texts based on feature cluster is proposed according to the data sparse- ness summed up by large scale and little characteristics of Microblog short texts. This approach is based on large-scale corpus. Firstly, the word2vec model is used to learn the latent semantic relations between words, and that each word is analyzed in the form of multidi- mensional vectors. Secondly, the affective features, which are extracted with reference to the sentimental dictionary, are merged into feature cluster which is based on the method of computing the word similarity with the term vector, so as to construct the text vector with low-dimension. Lastly, the machine-learning algorithm is used to realize the classification of Microblog short texts. The experi- ment turns out that the method presented is feasible and effective in reducing the dimensionality of affective features and shows effec- tiveness on text sentimental classification.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第12期2713-2716,共4页
Journal of Chinese Computer Systems
基金
国家社会科学基金项目(12BYY045)资助
广东外语外贸大学研究生科研创新项目(14GWCXXM-36)资助
广东外语外贸大学创新创业训练计划项目(201511846021)资助
关键词
微博情感
数据稀疏
词向量
特征簇
机器学习
microblog sentiment
data sparseness
term vector
feature cluster
machine learning