摘要
为改善现有传统情感词典无法精准捕捉乡村旅游情感词的情况,提出了一种面向在线评论自动构建乡村型情感词典的方法。以江西婺源为研究区域,利用网络爬虫采集了1.6万条微博评论。通过采用N-Gram语言模型和TF-IDF词频统计设置阈值筛选出候选词集x,将传统HowNet情感词库作为借鉴指导,从候选词集x中人工筛选出y个词频较高且感情最为强烈的种子情感词(x>y)。通过逐一比较x与y词语间的相似度,设定阈值筛选出乡村旅游领域的情感词库。经校验证明,该词典取得了良好的效果。
In order to improve the situation that the existing traditional sentiment dictionary cannot accurately capture the emotional words of rural tourism, a method for automatically constructing a rural sentiment dictionary for online reviews is proposed. Taking Wuyuan, a famous rural tourist attraction in Jiangxi, as the research area, a custom web crawler was constructed to collect and screen 16,000 Weibo comments. By using the N-Gram language model and TF-IDF word frequency statistics to set the threshold to filter the candidate word set x, using the traditional HowNet emotional vocabulary as a reference guide, from the candidate word set x, y words with high frequency and strongest emotions are manually selected The seed emotion word(x>y). By comparing the similarity between the words x and y one by one, a threshold is set to filter out the emotional vocabulary in the field of rural tourism. After verification, the dictionary has achieved good results.
作者
宗宇
方朝阳
吴波
ZONG Yu;FANG Chaoyang;WU Bo(College of Geography and Environment,Jiangxi Normal University,Nanchang 330000;Key Laboratory of Poyang Lake Wetland and Watershed Research,Ministry of Education,Nanchang 330000)
出处
《现代计算机》
2021年第18期79-84,共6页
Modern Computer
基金
国家社科基金艺术学重大项目:革命文物保护利用实践经验与制度创新研究(No.19ZD27)
文化艺术和旅游研究项目信息化发展专项:基于VR/AR和智能匹配的陶瓷艺术品在线推荐、定制、体验和交易一体化云平台(No.xxhfzzx201907)。