Analyzing COVID-19 Discourse on Twitter: Text Clustering and Classification Models for Public Health Surveillance

下载PDF

导出

摘要 Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucial source for public health surveillance,offering valuable insights into public reactions during the COVID-19 pandemic.This study aims to leverage a range of machine learning techniques to extract pivotal themes and facilitate text classification on a dataset of COVID-19 outbreak-related tweets.Diverse topic modeling approaches have been employed to extract pertinent themes and subsequently form a dataset for training text classification models.An assessment of coherence metrics revealed that the Gibbs Sampling Dirichlet Mixture Model(GSDMM),which utilizes trigram and bag-of-words(BOW)feature extraction,outperformed Non-negative Matrix Factorization(NMF),Latent Dirichlet Allocation(LDA),and a hybrid strategy involving Bidirectional Encoder Representations from Transformers(BERT)combined with LDA and K-means to pinpoint significant themes within the dataset.Among the models assessed for text clustering,the utilization of LDA,either as a clustering model or for feature extraction combined with BERT for K-means,resulted in higher coherence scores,consistent with human ratings,signifying their efficacy.In particular,LDA,notably in conjunction with trigram representation and BOW,demonstrated superior performance.This underscores the suitability of LDA for conducting topic modeling,given its proficiency in capturing intricate textual relationships.In the context of text classification,models such as Linear Support Vector Classification(LSVC),Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BiLSTM),Convolutional Neural Network with BiLSTM(CNN-BiLSTM),and BERT have shown outstanding performance,achieving accuracy and weighted F1-Score scores exceeding 80%.These results significantly surpassed other models,such as Multinomial Naive Bayes(MNB),Linear Support Vector Machine(LSVM),and Logistic Regression(LR),which achieved scores in the range of 60 to 70 percent.

作者 Pakorn Santakij Samai Srisuay Pongporn Punpeng

机构地区 Department of Information Technology Department of Computer Science

出处《Computer Systems Science & Engineering》 2024年第3期665-689,共25页 计算机系统科学与工程（英文）

关键词 Topic modeling text classification TWITTER feature extraction social media

分类号 TP39 [自动化与计算机技术—计算机应用技术] R563.1 [医药卫生—呼吸系统]

引文网络
相关文献

参考文献3

1Mohd Anul Haq,Mohd Abdul Rahim Khan.DNNBoT: Deep Neural Network-Based Botnet Detection and Classification[J].Computers, Materials & Continua,2022(4):1729-1750. 被引量：7
2Mohd Anul Haq.CDLSTM: A Novel Model for Climate Change Forecasting[J].Computers, Materials & Continua,2022(5):2363-2381. 被引量：7
3Mohd Anul Haq,Abdul Khadar Jilani,P.Prabu.Deep Learning Based Modeling of Groundwater Storage Change[J].Computers, Materials & Continua,2022(3):4599-4617. 被引量：4

二级参考文献5

1Jieren Cheng,Ruomeng Xu,Xiangyan Tang,Victor S.Sheng,Canting Cai.An Abnormal Network Flow Feature Sequence Prediction Approach for DDoS Attacks Detection in Big Data Environment[J].Computers, Materials & Continua,2018(4):95-119. 被引量：20
2Bingjie Yan,Jun Wang,Zhen Zhang,Xiangyan Tang,Yize Zhou,Guopeng Zheng,Qi Zou,Yao Lu,Boyi Liu,Wenxuan Tu,Neal Xiong.An Improved Method for the Fitting and Prediction of the Number of COVID-19 Confirmed Cases Based on LSTM[J].Computers, Materials & Continua,2020(9):1473-1490. 被引量：5
3Ayesha Jabeen,Sitara Afzal,Muazzam Maqsood,Irfan Mehmood,Sadaf Yasmin,Muhammad Tabish Niaz,Yunyoung Nam.An LSTM Based Forecasting for Major Stock Sectors Using COVID Sentiment[J].Computers, Materials & Continua,2021(4):1191-1206. 被引量：3
4El-Sayed M.El-kenawy,Hattan F.Abutarboush,Ali Wagdy Mohamed,Abdelhameed Ibrahim.Advance Artificial Intelligence Technique for Designing Double T-Shaped Monopole Antenna[J].Computers, Materials & Continua,2021(12):2983-2995. 被引量：10
5Abdullah Ali Salamai,El-Sayed M.El-kenawy,Ibrahim Abdelhameed.Dynamic Voting Classifier for Risk Identification in Supply Chain 4.0[J].Computers, Materials & Continua,2021(12):3749-3766. 被引量：12

共引文献11

1Peng Hui Li,Jie Xu,Zhong Yi Xu,Su Chen,Bo Wei Niu,Jie Yin,Xiao Feng Sun,Hao Liang Lan,Lu Lu Chen.Automatic Botnet Attack Identification Based on Machine Learning[J].Computers, Materials & Continua,2022(11):3847-3860.
2Ye-Eun Kim,Min-Gyu Kim,Hwankuk Kim.Detecting IoT Botnet in 5G Core Network Using Machine Learning[J].Computers, Materials & Continua,2022(9):4467-4488.
3Yanhua Lu,Xuehui Gong,Andrew Byron Kipnis.Prediction of Low-Energy Building Energy Consumption Based on Genetic BP Algorithm[J].Computers, Materials & Continua,2022(9):5481-5497.
4Juan Fang,Yunfei Mao,Min Cai,Li’ang Zhao,Huijie Chen,Wei Xiang.STTAR: A Traffic- and Thermal-Aware Adaptive Routing for 3D Network-on-Chip Systems[J].Computers, Materials & Continua,2022(9):5531-5545.
5Jintao Cui,Jihui Ding,Sheng Deng,Guangcheng Shao,Weiguang Wang,Xiaojun Wang,Yesilekin Nebi.Wheat Breeding Strategies under Climate Change based on CERES-Wheat Model[J].Computers, Materials & Continua,2022(9):6107-6118.
6刘友存,张水燕,刘伊楠,朱明勇,黄浩鸿,吴紫丹,刘涛.基于密切值法的梅江流域水资源承载力评价[J].中国国土资源经济,2023,36(6):21-27. 被引量：1
7Pengao Li,Haiyang Yu,Peng Zhou,Ping Zhang,Ruili Wang.Downscaling inversion of GRACE-derived groundwater storage changes based on ensemble learning[J].International Journal of Digital Earth,2023,16(1):2998-3022.
8Azar Abid Salih,Maiwan Bahjat Abdulrazaq.Cybernet Model:A New Deep Learning Model for Cyber DDoS Attacks Detection and Recognition[J].Computers, Materials & Continua,2024,78(1):1275-1295.
9Bashar Alshouha,Jesus Serrano-Guerrero,Francisco Chiclana,Francisco P.Romero,Jose A.Olivas.Personality Trait Detection via Transfer Learning[J].Computers, Materials & Continua,2024,78(2):1933-1956.
10Duy Quang Tran,Huy Q.Tran,Minh Van Nguyen.An Enhanced Ensemble-Based Long Short-Term Memory Approach for Traffic Volume Prediction[J].Computers, Materials & Continua,2024,78(3):3585-3602.

1José Márcio Soares Leite,Carlos Tomaz.Health Vigilance and Risk Factors for Cervical Cancer: A Study in the State of Maranhão, Brazil[J].Open Journal of Epidemiology,2024,14(1):75-89.
2Mian Muhammad Danyal,Sarwar Shah Khan,Muzammil Khan,Muhammad Bilal Ghaffar,Bilal Khan,Muhammad Arshad.Sentiment Analysis Based on Performance of Linear Support Vector Machine and Multinomial Naïve Bayes Using Movie Reviews with Baseline Techniques[J].Journal on Big Data,2023,5(1):1-18.
3Enyu Fan,Jingshu Wu,Shaoying Zeng.On the Fractional Derivatives with an Exponential Kernel[J].Communications on Applied Mathematics and Computation,2023,5(4):1655-1673.
4Lijun Liu,Xin Hu,Junsheng Chen,Ruixing Wu,Feixiong Chen.Embedded Scenario Clustering for Wind and Photovoltaic Power,and Load Based on Multi-head Self-attention[J].Protection and Control of Modern Power Systems,2024,9(1):122-132. 被引量：1
5刘峰,曹子宁,王福俊,李振.一种使用STL逻辑监控CPS的可解释规范挖掘方法[J].小型微型计算机系统,2024,45(1):9-15.
6Yan Zhao,Cheng Xing,Yating Deng,Can Ye,Hongling Peng.HIF-1α signaling: Essential roles in tumorigenesis and implications in targeted therapies[J].Genes & Diseases,2024,11(1):234-251. 被引量：1
7包秀兰.科尔沁民歌《乌尤黛》的词曲演变[J].蒙古学研究（蒙文版）,2023(4):96-102.
8Ireti Nethania Elie Ataigba,Djibrilla Moussa,Oumou Ousseyni Zika,Fidélia Dotou,Coovi Ignace Tokpanoude,Anselme Djidonou,Djibo Douma Maiga,Francis Tognon Tchegnonsi,Prosper Gandaho,Josiane Ezin Houngbe.Sexual Assault and Its Psychopathological Repercussions in the Life of Female Students in Northeastern Benin (2023)[J].Open Journal of Psychiatry,2024,14(3):163-178.
9Hemant Kumar,Vipin Saxena.Software Defect Prediction Using Hybrid Machine Learning Techniques: A Comparative Study[J].Journal of Software Engineering and Applications,2024,17(4):155-171.
10Jing ZHANG,Ruidong FAN,Hong TAO,Jiacheng JIANG,Chenping HOU.Constrained clustering with weak label prior[J].Frontiers of Computer Science,2024,18(3):117-132.

Computer Systems Science & Engineering

2024年第3期

浏览历史

内容加载中请稍等...

Analyzing COVID-19 Discourse on Twitter: Text Clustering and Classification Models for Public Health Surveillance

参考文献3

二级参考文献5

共引文献11

相关作者

相关机构

相关主题

浏览历史