期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark 被引量:7
1
作者 Zhuo Zhou Jiaohua Qin +3 位作者 Xuyu Xiang Yun Tan Qiang Liu Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2020年第1期217-231,共15页
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm... Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity. 展开更多
关键词 News text topic clustering spark platform countvectorizer algorithm TF-IDF algorithm latent dirichlet allocation model
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部