期刊文献+

景区评论词频统计算法研究

Study on word frequency statistics algorithm of scenic sites
下载PDF
导出
摘要 针对人们在出游前查看景区网络评价信息难以得到对该景区之整体评价的问题,提出了一种适用于海量数据的词频统计算法TF-CT.该算法采用余弦相似性算法对海量的文本数据进行词性分类,将具有相同表达态度的数据归为一类;采用TextRank算法对各类别中的一条数据进行关键词语提取;采用改进的TFIDF算法对提取的关键词进行词频统计,获取文本数据的表达态度.实验结果表明,与TFIDF算法相比,TF-CT算法在结果准确度和时间复杂度上具有更大的优势. Aiming at the problem that it is difficult to get the overall evaluation of the scenic spot when people are viewing the network evaluation information before the trip,a word frequency statistics algorithm TF-CT that is suitable for massive data was proposed. The cosine similarity algorithm was used to classify these huge text data,so that the data with the same expression attitude was categorized into a class. The TextRank algorithm was used to extract the key words in one of the data in each category. The word frequency of the extracted keywords was used to obtain the attitude of text data using the TFIDF algorithm. Experimental results showed that compared with the TFIDF algorithm,the TF-CT algorithm had greater advantages in accuracy and time complexity.
作者 黄敏 任宗华 朱颢东 HUANG Min;REN Zonghua;ZHU Haodong(College of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450001,China)
出处 《轻工学报》 CAS 2018年第3期51-56,共6页 Journal of Light Industry
基金 国家自然科学基金青年基金项目(61201447)
关键词 词频 文本数据 景区评价 TF-CT算法 TFIDF算法 word frequency text data scenic evaluation TF-CT algorithm TFIDF algorithm
  • 相关文献

参考文献7

二级参考文献135

共引文献648

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部