摘要
以携程网上消费者对酒店的文本评论为研究对象,通过对文本评论中的词语进行聚类,得到其中隐含的消费者最关注的酒店评价维度。为保证词语聚类的效果,引入语料库作为对比文档,通过分词、特征项表示、特征词编码标注、词义相似度计算以及基于DBSCAN的文本聚类过程,得到最后的评价维度,并以实例详细说明每个过程中所采用的方法及步骤。
This paper aims to explore the most important issues considered by consumers on the basis of corpus and consumers' online feedback, which are implied in the consumers' text comments. After the process of grabbing text comments data, words segmentation, the generation of characteristics collection, code marking, semantic similarity calculation and machine clustering, the dimensions of online hotel reputation are generated. And then an example is used to elaborate the detailed procedures and methods. The paper provides a new perspective to explore online feedback system. From a practical perspective, it also provides decision support to the manager of hotels and online hotel booking websites.
出处
《图书情报工作》
CSSCI
北大核心
2012年第12期124-129,共6页
Library and Information Service
基金
国家自然科学基金项目"基于文本挖掘的在线零售商信誉评价模型研究"(项目编号:70871048)研究成果之一
关键词
信誉维度
词语聚类
文本评论
词义相似度
语料库
reputation dimensions words clustering text comments semantic similarity corpus