摘要
主观性句子识别旨在发现文本集合中具有观点的句子。本文基于概率主题模型,提出融合主题的主观性句子识别模型。该模型通过考虑主题因素识别句子主观性,同时挖掘文本集合中的潜在主观性主题。提出的模型是一个弱监督生成模型,不需要大量的标记语料进行训练,仅需要一小部分领域独立的主观性词典修改模型的先验。实验证明,提出的模型能有效地提高句子识别召回率和F值,同时抽取的主观性主题具有较强的语义信息。
Subjectivity sentence identification aims to detect the opinionated sentences in text. This paper proposes mixing topics and subjectivity sentence identification model based on probabilistic topic model. Through considering the topics, the model can detect the subjective sentences, and can also extract the subjective topics from texts simultaneously. The proposed model is a weakly-supervised generative model, which only needs a small set of domain independent subjectivity lexicon to modify prior of model. The experiment results demonstrate that the model can highly improve the sentence subjectivity identification recall and the F-value, and the extracted subiectivity topics are semantically informative.
出处
《计算机与现代化》
2012年第12期127-130,135,共5页
Computer and Modernization
基金
福建省自然科学基金资助项目(2010J05133)
福州大学科技发展基金资助项目(2010-XQ-22)
关键词
主观性句子识别
观点挖掘
概率主题模型
弱监督
subjectivity sentence identification
opinion mining
probabilistic topic model
weakly-supervised