摘要
该文针对网络评论倾向分级问题,提出了一种基于观点袋模型和语言学规则的多级情感分类方法。通过分析句子中的词性搭配关系,设计了12种抽取特征-观点搭配模式,并对存在问题给出了解决策略。依据汉语用词特点和词汇在汽车领域的特殊用法,提出搭配四元组的情感倾向极性值计算方法。在此基础上,利用获取的搭配四元组及其情感倾向极性,建立文本的向量化表示,并构造了权重计算公式。最后,利用文本余弦相似度计算方法实现对评论文本的五级情感极性分类。通过在COAE2012任务3的汽车数据集上进行的测试,取得了较好的分类结果。
Focused on the online review sentiment polarity classification problem, a multi-level sentiment classifica tion method is proposed based on bag-of-opinion model and a set of linguistic rules. According to the part-of-speech of each word in the sentences, 12 patterns are designed for the feature-opinion pairs' extraction, which enable to re- present the whole text in a series of four-tuple of "feature, degree word, opinion word, negation word". After de- signing the estimation of the sentiment priority of the four-tuple, the cosine similarity is further adopted for a 5-level sentiment polarity classification. Experiments on the dataset from COAE2012 Task 3 car dataset indicate a good re- sult compared to the performances of the other runs in COAE.
出处
《中文信息学报》
CSCD
北大核心
2015年第3期113-120,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61175067
61272095)
山西省自然科学基金(2010011021-1
2013011066-4)
山西省科技攻关项目(20110321027-02)
山西省留学基金(2013-014)
关键词
情感分类
观点袋模型
词性搭配
sentiment classification
bag of opinion
POS collocation