Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-1...Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-19 pandemic,social media platforms became the scene of tense debates between believers and doubters.Healthcare professionals and public health agencies also use social media to inform the public about COVID-19 news and updates.However,they occasionally have trouble managing massive pandemic-related rumors and frauds.One reason is that people share and engage,regardless of the information source,by assuming the content is unquestionably true.On Twitter,users use words and phrases literally to convey their views or opinion.However,other users choose to utilize idioms or proverbs that are implicit and indirect to make a stronger impression on the audience or perhaps to catch their attention.Idioms and proverbs are figurative expressions with a thematically coherent totality that cannot understand literally.Despite more than 10%of tweets containing idioms or slang,most sentiment analysis research focuses on the accuracy enhancement of various classification algorithms.However,little attention would decipher the hidden sentiments of the expressed idioms in tweets.This paper proposes a novel data expansion strategy for categorizing tweets concerning COVID-19.The following are the benefits of the suggested method:1)no transformer fine-tuning is necessary,2)the technique solves the fundamental challenge of the manual data labeling process by automating the construction and annotation of the sentiment lexicon,3)the method minimizes the error rate in annotating the lexicon,and drastically improves the tweet sentiment classification’s accuracy performance.展开更多
基金This work was supported in part by the UTAR Research Fund(IPSR/RMC/U TARRF/2020-C1/R01).
文摘Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-19 pandemic,social media platforms became the scene of tense debates between believers and doubters.Healthcare professionals and public health agencies also use social media to inform the public about COVID-19 news and updates.However,they occasionally have trouble managing massive pandemic-related rumors and frauds.One reason is that people share and engage,regardless of the information source,by assuming the content is unquestionably true.On Twitter,users use words and phrases literally to convey their views or opinion.However,other users choose to utilize idioms or proverbs that are implicit and indirect to make a stronger impression on the audience or perhaps to catch their attention.Idioms and proverbs are figurative expressions with a thematically coherent totality that cannot understand literally.Despite more than 10%of tweets containing idioms or slang,most sentiment analysis research focuses on the accuracy enhancement of various classification algorithms.However,little attention would decipher the hidden sentiments of the expressed idioms in tweets.This paper proposes a novel data expansion strategy for categorizing tweets concerning COVID-19.The following are the benefits of the suggested method:1)no transformer fine-tuning is necessary,2)the technique solves the fundamental challenge of the manual data labeling process by automating the construction and annotation of the sentiment lexicon,3)the method minimizes the error rate in annotating the lexicon,and drastically improves the tweet sentiment classification’s accuracy performance.