摘要
常用的基于特征表达的跨领域文本倾向性分析的基本思想是通过统计的方法对源领域和目标领域的数据进行特征对齐,再根据特征间关联构建目标领域的分类器。从词汇倾向性计算入手,提出了一种基于领域基准词表的跨领域倾向性计算的方法。与传统的词汇倾向性计算方法不同的是,该方法在构建基准词表时,同时考虑词性和领域信息,在计算倾向性时,根据词汇当前的词性和领域信息采用相应的领域基准词表进行计算。实验结果表明:与传统的跨领域倾向性分析算法相比,虽然该方法在准确率上的优势不明显,但可以不依赖源领域和目标领域文本数据;与传统的基于基准词表的倾向性计算方法相比,该方法能够大幅提高倾向性分析的准确率。
M ost of the traditional domain transfer sentiment classification which based on feature express are based on feature alignment,which extract from source domain and aim domain. The classification for the aim domain are established base on these features relation. In this paper,based on word sentiment calculated,an approach for cross-domain sentiment classification based on domain standard word list is proposed. Different from other standard word list based classification algorithm,this standard word list considered not only the word part of speech,but also the domain information. The word sentiment is calculated,by different domain and part of speech standard word list,according to its domain and part of speech. The experiment shows: 1,Compared with the traditional transfer domain algorithm,the source and aim domain texts are not need,although there is no obvious advantage on accuracy; 2,Compared with the sentiment classification based on standard word list,the accuracy is clearly better.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2016年第7期59-65,共7页
Journal of Shandong University(Natural Science)
基金
北京市青年拔尖人才资助项目(13031821005)
北京市教育委员会科技计划面上项目(KM20121001700613031821005)
关键词
中文信息处理
跨领域倾向性分析
词汇倾向性计算
基准词表
Chinese information processing
cross-domain sentiment classification
word orientation
standard word list