期刊文献+

融合LDA和GloVe模型的病症文本聚类算法 被引量:1

Disease Text Clustering Algorithm Based on LDA and GloVe Model
下载PDF
导出
摘要 针对隐含狄利克雷分布(LDA)模型特征提取时忽略语义信息的问题,提出一种融合LDA和全局文本表示(GloVe)模型的病症文本聚类算法LG&K-Medoide。首先,利用LDA对病症文本数据建模,采用JS(Jensen-Shannon)距离计算文本相似度;其次,利用GloVe对病症文本数据建模获取词向量,根据病症词性贡献度,对词向量权重进行标注,采用余弦距离计算基于GloVe建模加权的文本相似度;最后,将两种相似度进行结合,改进距离公式,实现K-Medoide聚类。实验结果表明,LG&K-Medoide算法较基于LDA,LDA+TF-IDF,LDA+Word2Vec模型的聚类算法具有较高的精度。 Aiming at solving the problem of ignoring semantic information in LDA model feature extraction,a disease text clustering algorithm LG&K-Medoide based on LDA and GloVe model was proposed.First,LDA was used to model the disease text data,and the JS distance was used to calculate the text similarity;second,GloVe was used to model the disease text data to obtain the word vector,the weight of the word vector was labeled according to the contribution to part of speech from disease text,and the cosine distance was used to calculate weighted text similarity based on GloVe modeling;finally,the two similarities are combined to improve the distance formula to realize K-Medoide clustering.The experimental results show that the LG&K-Medoide algorithm has higher accuracy than the clustering algorithm based on LDA,LDA+TF-IDF and LDA+Word2 Vec models.
作者 吴迪 赵玉凤 WU Di;ZHAO Yufeng(School of Information and Electrical Engineering,Hebei University of Engineering,Handan,Hebei 056038,China)
出处 《河北工程大学学报(自然科学版)》 CAS 2022年第1期92-98,共7页 Journal of Hebei University of Engineering:Natural Science Edition
基金 河北省自然科学基金资助项目(F2020402003,F2019402428)。
关键词 病症文本 LDA GLOVE 相似度结合 聚类 disease text LDA GloVe similarity combined finite clustering
  • 相关文献

参考文献8

二级参考文献94

共引文献145

同被引文献24

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部