摘要
比较是在线评论中较为常见的一种评价形式,从海量的在线评论中识别出包含比较关系的评论,并将这些比较关系可视化是文本挖掘的研究热点。本文提出一种根据比较句的次范畴判别语料类型的比较句识别方法,采用规则与统计相结合的方法,将人工模式库与CSR方法相结合,构造了比较句混合规则库,在此基础上进行比较实体名的二次识别,实现了比较句的准确识别和类型判别。以大众点评网的餐馆评论作为实验语料,结果表明,在保证召回率的同时,该方法能有效地提高比较句识别的准确率。在此基础上对产品特征以及比较观点进行了挖掘和情感计算,实现了可视化的餐馆竞争力分析。
Comparative opinions become ubiquitous throughout online reviews. Comparisons are widely used by consumers in product evaluation so as to highlight what they prefer, thus can serve as a proxy for product competitiveness analysis. So comparative opinion mining from the massive amount of online comments is of concern to text mining. This paper proposes a novel method for identifying comparative sentences based on the fruit of linguistic studies. The definition of subcategory of comparative sentence is proposed and a mixed rule pool of both manual rules and CSR rules is set up. Besides, an entity dictionary is used as a re-check of the identification result which can ensure precise identification and classification of comparative sentences. Real online comments are collected from Dianping. com as experimental data. The result shows that the proposed method outperforms baseline methods in term of identification precision. Based on the result, features and opinions of comparative sentences are extracted. We then conduct sentiment analysis to calculate the sentimental score of the comparison relations. Finally, a competitive analysis of restaurants is visualized.
出处
《情报学报》
CSSCI
北大核心
2015年第12期1259-1269,共11页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金项目(71371144)
上海市哲学社会科学规划课题一般项目(2013BGL004)
关键词
比较关系
在线评论
模式匹配
类序列规则
竞争力分析
餐饮业
comparative pattern, online reviews, pattern match, class sequence rule, competitiveness analysis, restaurant