期刊文献+

基于LDA主题的改进TFIDF 95598工单智能分类研究 被引量:9

Study on Intelligent Classification of Improved TFIDF 95598 Work Order Based on LDA
下载PDF
导出
摘要 为了提高95595工单智能分类的准确率,提出了基于LDA(Latent Dirichlet Allocation)的改进TFIDF算法。先对文本提取特征词,然后采用K-means算法进行聚类处理。构建LDA模型,获得概率分布函数θ和φ,求取语义影响力SI(semantic influence,SI)作为特征词的权重,该改进算法记作SI-TFIDF(semantic influence-term frequency inverse document frequency,SI-TFIDF)。将SI-TFIDF算法与传统的TFIDF算法在sougou的数据库进行特征词提取,并采用K-means算进行聚类,对比结果显示,采用SI-TFIDF算法提取的特征词聚类效果优于TFIDF,验证了所提出方法的可靠性。在95598投诉工单上进行仿真实验,SI-TFIDF算法的投诉工单聚类准确率高于传统的TFIDF算法,验证了SI-TFIDF更适用于处理工单投诉的分类研究。 In order to improve the accuracy of intelligent classification of 95595 work order,an improved TFIDF algorithm based on LDA(Latent Dirichlet allocation)is proposed.The text feature words are extracted and then the K-means algorithm is used for clustering processing.The probability distribution functionsθandφare obtained by constructing the LDA,and semantic influence(SI)is obtained as the weight of feature words.The improved algorithm is denoted as the semantic influence-term frequency inverse document frequency(SI-TFIDF).SI-TFIDF algorithm and the traditional TFIDF algorithm are used to extract feature words in Sougou database,and K-means algorithm is used for clustering.The comparison results show that the feature words extracted by SI-TFIDF algorithm is better than TFIDF,which verifies the reliability of the method proposed in this paper.Simulation experiments on 95598 complaint work order shows that the clustering accuracy of the complaint work order of SI-TFIDF algorithm is higher than that of the traditional TFIDF algorithm,which verifies that SI-TFIDF is more suitable for the classification research of handling complaint work order.
作者 武光华 李洪宇 刘二刚 柳长发 李倩 WU Guanghua;LI Hongyu;LIU Ergang;LIU Changfa;LI Qian(Research Institute, State Grid Hebei Electric Power, Shijiazhuang 050000)
出处 《微型电脑应用》 2020年第3期87-90,共4页 Microcomputer Applications
关键词 95598 投诉工单 LATENT DIRICHLET ALLOCATION TERM FREQUENCY inverse document FREQUENCY 95598 Complaint sheets Latent Dirichlet allocation Term frequency inverse document frequency
  • 相关文献

参考文献11

二级参考文献95

共引文献171

同被引文献71

引证文献9

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部