Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm...Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity.展开更多
As critical conduits for the dissemination of online public opinion,social media platforms offer a timely and effective means for managing emergencies during major disasters,such as earthquakes.This study focuses on t...As critical conduits for the dissemination of online public opinion,social media platforms offer a timely and effective means for managing emergencies during major disasters,such as earthquakes.This study focuses on the analysis of online public opinions following the Maduo M7.4 earthquake in Qinghai Province and the Yangbi M6.4 earthquake in Yunnan Province.By collecting,cleaning,and organizing post-earthquake Sina Weibo(short for Weibo)data,we employed the Latent Dirichlet Allocation(LDA)model to extract information pertinent to public opinion on these earthquakes.This analysis included a comparison of the nature and temporal evolution of online public opinions related to both events.An emotion analysis,utilizing an emotion dictionary,categorized the emotional content of post-earthquake Weibo posts,facilitating a comparative study of the characteristics and temporal trends of online public emotions following the earthquakes.The findings were visualized using Geographic Information System(GIS)techniques.The analysis revealed certain commonalities in online public opinion following both earthquakes.Notably,the peak of online engagement occurred within the first 24 hours post-earthquake,with a rapid decline observed between 24 to 48 hours thereafter.The variation in popularity of online public opinion was linked to aftershock occurrences.Adjusted for population factors,online engagement in areas surrounding the earthquake sites and in Sichuan Province was significantly high.Initially dominated by feelings of“fear”and“surprise”,the public sentiment shifted towards a more positive outlook with the onset of rescue operations.However,distinctions in the online public response to each earthquake were also noted.Following the Yangbi earthquake,Yunnan Province reported the highest number of Weibo posts nationwide;in contrast,Qinghai Province ranked third post-Maduo earthquake,attributable to its smaller population size and extensive damage to communication infrastructure.This research offers a methodological approach for the analysis of online public opinion related to earthquakes,providing insights for the enhancement of post-disaster emergency management and public mental health support.展开更多
基金This work is supported by the Science Research Projects of Hunan Provincial Education Department(Nos.18A174,18C0262)the National Natural Science Foundation of China(No.61772561)+2 种基金the Key Research&Development Plan of Hunan Province(Nos.2018NK2012,2019SK2022)the Degree&Postgraduate Education Reform Project of Hunan Province(No.209)the Postgraduate Education and Teaching Reform Project of Central South Forestry University(No.2019JG013).
文摘Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity.
基金funded by the Science Research Project of Hebei Education Department(No.BJK2023088).
文摘As critical conduits for the dissemination of online public opinion,social media platforms offer a timely and effective means for managing emergencies during major disasters,such as earthquakes.This study focuses on the analysis of online public opinions following the Maduo M7.4 earthquake in Qinghai Province and the Yangbi M6.4 earthquake in Yunnan Province.By collecting,cleaning,and organizing post-earthquake Sina Weibo(short for Weibo)data,we employed the Latent Dirichlet Allocation(LDA)model to extract information pertinent to public opinion on these earthquakes.This analysis included a comparison of the nature and temporal evolution of online public opinions related to both events.An emotion analysis,utilizing an emotion dictionary,categorized the emotional content of post-earthquake Weibo posts,facilitating a comparative study of the characteristics and temporal trends of online public emotions following the earthquakes.The findings were visualized using Geographic Information System(GIS)techniques.The analysis revealed certain commonalities in online public opinion following both earthquakes.Notably,the peak of online engagement occurred within the first 24 hours post-earthquake,with a rapid decline observed between 24 to 48 hours thereafter.The variation in popularity of online public opinion was linked to aftershock occurrences.Adjusted for population factors,online engagement in areas surrounding the earthquake sites and in Sichuan Province was significantly high.Initially dominated by feelings of“fear”and“surprise”,the public sentiment shifted towards a more positive outlook with the onset of rescue operations.However,distinctions in the online public response to each earthquake were also noted.Following the Yangbi earthquake,Yunnan Province reported the highest number of Weibo posts nationwide;in contrast,Qinghai Province ranked third post-Maduo earthquake,attributable to its smaller population size and extensive damage to communication infrastructure.This research offers a methodological approach for the analysis of online public opinion related to earthquakes,providing insights for the enhancement of post-disaster emergency management and public mental health support.