摘要
该文主要研究如何自动识别微博中用户对各品牌汽车进行评价的句子。针对微博中汽车宣传信息较多而由真正汽车用户发出的观点句所占比例很小的特点,该文提出了结合微博和汽车评论语料的基于SVM模型的分类方法。选取的特征包括词语、评价词个数、与评价对象有关的词语以及微博相关特征。实验表明,评价词特征和部分微博相关特征可有效提高分类器性能,使用微博和汽车评论两种语料进行训练的分类器性能要比仅使用微博语料的方法好。
This paper investigates how to automatically recognize the customer opinions towards certain automobiles in microblogs. Since there are a lot of advertises and release information of cars in microblogs, customer-generated opinion sentences are sparse, this paper proposes a SVM classifier-based method to combine microblog data and car review data for training. The selected features include words, the number of opinion words, words that have certain relations with opinion targets, as well as microblog-related features such as emoticons and user type. Experiment results indicate that opinion words feature and some of the microblog-related features boost the performance of the classifier. In addition, the performance of the classifier that uses two kinds of data for training is better than the one that only uses microblog data.
出处
《中文信息学报》
CSCD
北大核心
2014年第5期148-154,共7页
Journal of Chinese Information Processing
关键词
微博
观点句识别
意见挖掘
SVM
microblog
opinioned sentences recognition
opinion mining
SVM