The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods...The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods have become impractical due to their resource demands.Automated Machine Learning(AutoML)systems automate this process,but often neglect the group structures and sparsity in meta-features,leading to inefficiencies in algorithm recommendations for classification tasks.This paper proposes a meta-learning approach using Multivariate Sparse Group Lasso(MSGL)to address these limitations.Our method models both within-group and across-group sparsity among meta-features to manage high-dimensional data and reduce multicollinearity across eight meta-feature groups.The Fast Iterative Shrinkage-Thresholding Algorithm(FISTA)with adaptive restart efficiently solves the non-smooth optimization problem.Empirical validation on 145 classification datasets with 17 classification algorithms shows that our meta-learning method outperforms four state-of-the-art approaches,achieving 77.18%classification accuracy,86.07%recommendation accuracy and 88.83%normalized discounted cumulative gain.展开更多
Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-1...Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-19 pandemic,social media platforms became the scene of tense debates between believers and doubters.Healthcare professionals and public health agencies also use social media to inform the public about COVID-19 news and updates.However,they occasionally have trouble managing massive pandemic-related rumors and frauds.One reason is that people share and engage,regardless of the information source,by assuming the content is unquestionably true.On Twitter,users use words and phrases literally to convey their views or opinion.However,other users choose to utilize idioms or proverbs that are implicit and indirect to make a stronger impression on the audience or perhaps to catch their attention.Idioms and proverbs are figurative expressions with a thematically coherent totality that cannot understand literally.Despite more than 10%of tweets containing idioms or slang,most sentiment analysis research focuses on the accuracy enhancement of various classification algorithms.However,little attention would decipher the hidden sentiments of the expressed idioms in tweets.This paper proposes a novel data expansion strategy for categorizing tweets concerning COVID-19.The following are the benefits of the suggested method:1)no transformer fine-tuning is necessary,2)the technique solves the fundamental challenge of the manual data labeling process by automating the construction and annotation of the sentiment lexicon,3)the method minimizes the error rate in annotating the lexicon,and drastically improves the tweet sentiment classification’s accuracy performance.展开更多
文摘The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods have become impractical due to their resource demands.Automated Machine Learning(AutoML)systems automate this process,but often neglect the group structures and sparsity in meta-features,leading to inefficiencies in algorithm recommendations for classification tasks.This paper proposes a meta-learning approach using Multivariate Sparse Group Lasso(MSGL)to address these limitations.Our method models both within-group and across-group sparsity among meta-features to manage high-dimensional data and reduce multicollinearity across eight meta-feature groups.The Fast Iterative Shrinkage-Thresholding Algorithm(FISTA)with adaptive restart efficiently solves the non-smooth optimization problem.Empirical validation on 145 classification datasets with 17 classification algorithms shows that our meta-learning method outperforms four state-of-the-art approaches,achieving 77.18%classification accuracy,86.07%recommendation accuracy and 88.83%normalized discounted cumulative gain.
基金This work was supported in part by the UTAR Research Fund(IPSR/RMC/U TARRF/2020-C1/R01).
文摘Millions of people are connecting and exchanging information on social media platforms,where interpersonal interactions are constantly being shared.However,due to inaccurate or misleading information about the COVID-19 pandemic,social media platforms became the scene of tense debates between believers and doubters.Healthcare professionals and public health agencies also use social media to inform the public about COVID-19 news and updates.However,they occasionally have trouble managing massive pandemic-related rumors and frauds.One reason is that people share and engage,regardless of the information source,by assuming the content is unquestionably true.On Twitter,users use words and phrases literally to convey their views or opinion.However,other users choose to utilize idioms or proverbs that are implicit and indirect to make a stronger impression on the audience or perhaps to catch their attention.Idioms and proverbs are figurative expressions with a thematically coherent totality that cannot understand literally.Despite more than 10%of tweets containing idioms or slang,most sentiment analysis research focuses on the accuracy enhancement of various classification algorithms.However,little attention would decipher the hidden sentiments of the expressed idioms in tweets.This paper proposes a novel data expansion strategy for categorizing tweets concerning COVID-19.The following are the benefits of the suggested method:1)no transformer fine-tuning is necessary,2)the technique solves the fundamental challenge of the manual data labeling process by automating the construction and annotation of the sentiment lexicon,3)the method minimizes the error rate in annotating the lexicon,and drastically improves the tweet sentiment classification’s accuracy performance.