摘要
面对海量的在线评论,有用特征识别有助于消费者选择高质量的评论,为合理决策提供支持。该文基于信息采纳模型理论,在数码相机和手机数据集上提取了四类影响评论质量的有用特征集合,以logistic岭回归和基本decision tree模型作为基准模型,并结合递归特征消除(RFE)降维方法,比较检验了GBDT模型对评论质量分类和特征降维上的表现,揭示了各特征项对评论质量分类结果的"贡献度",进而识别关键特征。实验结果表明,基于GBDT模型对评论质量分类效果较好,评论发表时间、评论者排名、关键特征数量、评论字数是影响评论质量的关键特征。
Faced with hundreds of thousands of online reviews, helpful review features facilitate consumers' to identify high quality reviews to support decision making. Based on information adoption model, this paper examines four kinds of useful features sets, totaling seventeen features, on the domains of camera and mobile. With baselines by the logisite ridege regression and decision tree models, the paper investigates the GBDT model in review quality clas sification and features reduction, which reveals the feature contribution as the basis of key features identification. The experiment result shows that timeliness, reviewer ranking, key product features number, and review words number are key features influencing review quality, forming the optimized feature set for the GBDT model .
出处
《中文信息学报》
CSCD
北大核心
2017年第3期109-117,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(71371144
71601082)
关键词
GBDT
评论质量
特征贡献度
信息采纳模型
递归特征消除
GBDT
review quality
feature contribution
information adoption model
recursive feature elimination