摘要
倾向性句子识别是文本倾向性分析的重要组成部分,其目的是识别文档中具有情感倾向的主观性句子。中文句子的倾向性不仅与倾向词有关,而且还跟句法、语义等因素有关,这使得倾向性句子识别不能简单地从词语的倾向性来统计得到。该文提出了一种基于N-gram超核的中文倾向性句子识别分类算法。该算法基于句子的句法、语义等特征构造N-gram超核函数,并采用基于该超核函数的支持向量机分类器识别中文倾向性句子。实验结果表明,与多项式核、N-gram核等单核函数相比,基于N-gram超核的中文倾向性句子识别算法在一定程度上能有效识别倾向性句子。
Identification of Chinese opinion sentences is an important task of Chinese Opinion Mining.It aims to identify subjective sentences which express opinion on some topic from document..Because the opinion strength of Chinese sentence relates to not only the statistics of sentiment lexicon but also the factors such as syntactic and semantic features,identification of Chinese opinion sentences can not simply decided by TF-IDF score of sentiment words.This paper proposes a new method for the identification of Chinese opinion sentences based on N-gram Hyperkernel function.The method introduces syntactic and semantic features to construct N-gram Hyperkernel function,and then applies SVM based on the N-gram Hyperkernel function to identify opinion sentences.The experiments show that our method is effective and outperforms competitive methods based on polynomial kernel,radial kernel and n-gram kernel.
出处
《中文信息学报》
CSCD
北大核心
2011年第5期89-93,100,共6页
Journal of Chinese Information Processing
基金
福建省自然科学基金资助项目(2010J05133)
福建省科技创新平台计划项目(2009J1007)
福州大学科技发展基金资助项目(2010-XQ-22)
关键词
倾向性句子识别
N-gram超核函数
倾向性分析
identification of Chinese opinion sentences
N-gram hyperkernel function
opinion mining