摘要
为了辅助进行产品评论中特征-观点对识别的挖掘工作,对细颗粒度产品评论语料库的构建技术进行了研究.介绍了用于产品评论细颗粒度挖掘的语料库构建方法,以及目前初步进行的语料标注工作.标注数据可以数据库形式存储,从而实现了无结构化到结构化的转变,为自动查询等处理提供了极大方便.实验结果表明:虽然文中的标注方法以手机产品为例,但具有良好的移植性,可以应用到其他产品评论的细颗粒度语料库构建.相应的语料库构建对于高性能机器学习方法的应用、特征-观点对识别算法的性能提高以及自动评价等具有重要意义.
Quantitative analysis and mining of product reviews posted by users are helpful for both manufacturers and consumers.During the work of fine-granularity product review mining,extracting feature-opinion pair is one of the core works.The corresponding corpus construction is of great significance for the application of high performance machine learning methods,improving the performance of feature-opinion extraction algorithm and automatic evaluation.This article introduces corpus constructing technology for fine-granularity product review mining and the initial corpus labeling work,thus realizing non-structured to structural changes.The corpus can be stored in database and thus provide great convenience for automatic query processing.Although current labeling work was performed in mobile phone products,it can be applied also to other product types for fine granularity corpus construction.So our work has good transplantation ability.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2012年第3期64-68,共5页
Journal of Harbin Institute of Technology
基金
教育部人文社会科学研究青年基金资助项目(10YJCZH099)
中央高校基本科研业务费专项资金资助项目(HIT.NSRIF.2009065)
语言语音教育部-微软重点实验室开放基金资助项目(HIT.KLOF.2009022)
关键词
产品意见挖掘
细颗粒度语料库构建
语料标注
product review mining
fine-granularity corpus construction
corpus annotation