期刊文献+

融合密度峰值与局部特征的大数据情感分析 被引量:3

Big Data Sentiment Analysis Based on Fusion of Peak Density and Local Features
下载PDF
导出
摘要 针对现有大数据情感分析方法普遍存在分类不准、效率不高、结果片面等现象,提出了融合密度峰值与局部特征提取分类方法。考虑到大数据场景中情感分析的参量复杂度,设计了局部优化密度峰值聚类对原始数据进行分簇操作,通过区域分割与独立聚类实现并行计算,最终将所有区域输出合并得到全局聚类结果。对于区域分割产生的分簇交集,采取边界扩展,利用高斯核优化密度计算,同时根据密度与距离乘积实时调整门限,使聚类中心能够实现自适应调节。基于聚类结果,设计了局部优化文本特征提取,利用BiLSTM-CNN提取文本词汇与句子特征,并对其采取融合处理,再利用CNN提取的语义局部特征纠正结果,从而使提取的文本特征最接近上下文语义。基于COAE2014数据集的仿真,分别从ARI、Precision、Recall、F1-measure,以及时间指标方面,验证了融合密度峰值与局部特征提取分类方法具有与实际更吻合的聚类性能,显著提高了大数据情感分析的准确性与全面性,且有效提升了大数据应用场景下的实时性。 The existing big data sentiment analysis is generally characterized by inaccurate classification, low efficiency, and one-sided results. For this reason, a classification method based on fusion density peak and local feature extraction was proposed. Considering the parameter complexity of sentiment analysis in big data scenarios, a local optimized density peak clustering was designed to cluster the original data, parallel computing was realized by region segmentation and independent clustering. Finally, the global clustering results were obtained by combining the output of all regions. For the cluster intersection generated by region segmentation, boundary extension was adopted, Gaussian kernel was used to optimize the density calculation, and the threshold was adjusted in real-time according to the product of density and distance so that the cluster center could be adjusted adaptively. Based on the clustering results, the local optimization of text feature extraction was designed, BiLSTM-CNN was used to extract the features of words and sentences, and the fusion processing was adopted. Then, the local semantic features extracted by CNN were used to correct the results, so that the extracted text features were closest to the context semantics. Simulation experiments were based on the COAE2014 dataset. From the aspects of ARI, Precision, Recall, F1-measure, and time index, it is verified that the fusion of density peak and local feature extraction classification method has better clustering performance than the actual one. It significantly improves the accuracy and comprehensiveness of big data sentiment analysis. And it effectively improves the real-time performance of big data application scenarios.
作者 孟祥光 郭东伟 MENG Xiang-guang;GUO Dong-wei(College of Software,Jilin University,Changchun Jilin 130021,China)
出处 《计算机仿真》 北大核心 2021年第6期238-241,414,共5页 Computer Simulation
基金 教育部产学合作协同育人项目(201702001013)。
关键词 文本大数据 情感分析 密度峰值 区域分割 局部特征提取 Text big data Sentiment analysis Density peak Region segmentation Local feature extraction
  • 相关文献

参考文献8

二级参考文献38

共引文献90

同被引文献27

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部