期刊文献+

深度嵌入聚类及其在投诉文本分析中的应用

Deep Embedding Clustering and Its Application in Analysis of Complaint Text
下载PDF
导出
摘要 针对互联网存在的巨量涉及电力投诉的用户生成超短文本,本文提出一种基于深度嵌入的聚类模型,以实现互联网电力投诉文本话题识别的方法。首先,通过改进算法进行词嵌入,以提高文本特征的语义丰度并降低数据集维度;然后,在词嵌入的基础上,借助Sentence-Bert进行句子相似度计算,从而实现短文本聚类;最后,在自主爬取的互联网用户留言中涉及电力投诉的文本数据集上部署提出的方法,完成了投诉文本的话题聚类,并与多个已有的话题识别算法在同一数据集上的效果进行比较,证明了提出模型的有效性。 In view of the huge amount of Internet user-generated ultra-short text involving power complaints, a clustering model based on deep embedding is proposed to realize the topic recognition method of Internet power complaints text in this paper. Firstly, word embedding is carried out by an improved algorithm to enhance the semantic richness of text features and reduce the dimension of data set. Then, sentence similarity is calculated by using Sentence-Bert to realize short text clustering based on word embedding. Finally, the proposed method is deployed on the text data set involving power complaints in the self-crawling Internet user messages to complete the topic clustering of the complaint text, and the effect of several existing topic recognition algorithms on the same data set is compared, which proves the effectiveness of the proposed model.
出处 《计算机科学与应用》 2023年第4期853-864,共12页 Computer Science and Application
  • 相关文献

参考文献10

二级参考文献113

共引文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部