Cyberbullying is a form of harassment or bullying that takes place online or through digital devices like smartphones,computers,or tablets.It can occur through various channels,such as social media,text messages,onlin...Cyberbullying is a form of harassment or bullying that takes place online or through digital devices like smartphones,computers,or tablets.It can occur through various channels,such as social media,text messages,online forums,or gaming platforms.Cyberbullying involves using technology to intentionally harm,harass,or intimidate others and may take different forms,including exclusion,doxing,impersonation,harassment,and cyberstalking.Unfortunately,due to the rapid growth of malicious internet users,this social phenomenon is becoming more frequent,and there is a huge need to address this issue.Therefore,the main goal of the research proposed in this manuscript is to tackle this emerging challenge.A dataset of sexist harassment on Twitter,containing tweets about the harassment of people on a sexual basis,for natural language processing(NLP),is used for this purpose.Two algorithms are used to transform the text into a meaningful representation of numbers for machine learning(ML)input:Term frequency inverse document frequency(TF-IDF)and Bidirectional encoder representations from transformers(BERT).The well-known eXtreme gradient boosting(XGBoost)ML model is employed to classify whether certain tweets fall into the category of sexual-based harassment or not.Additionally,with the goal of reaching better performance,several XGBoost models were devised conducting hyperparameter tuning by metaheuristics.For this purpose,the recently emerging Coyote optimization algorithm(COA)was modified and adjusted to optimize the XGBoost model.Additionally,other cutting-edge metaheuristics approach for this challenge were also implemented,and rigid comparative analysis of the captured classification metrics(accuracy,Cohen kappa score,precision,recall,and F1-score)was performed.Finally,the best-generated model was interpreted by Shapley additive explanations(SHAP),and useful insights were gained about the behavioral patterns of people who perform social harassment.展开更多
基金supported by the Science Fund of the Republic of Serbia,Grant No.7373Characterizing Crises-Caused Air Pollution Alternations Using an Artificial Intelligence-Based Framework-crAIRsis and Grant No.7502Intelligent Multi-Agent Control and Optimization applied to Green Buildings and Environmental Monitoring Drone Swarms-ECOSwarm.
文摘Cyberbullying is a form of harassment or bullying that takes place online or through digital devices like smartphones,computers,or tablets.It can occur through various channels,such as social media,text messages,online forums,or gaming platforms.Cyberbullying involves using technology to intentionally harm,harass,or intimidate others and may take different forms,including exclusion,doxing,impersonation,harassment,and cyberstalking.Unfortunately,due to the rapid growth of malicious internet users,this social phenomenon is becoming more frequent,and there is a huge need to address this issue.Therefore,the main goal of the research proposed in this manuscript is to tackle this emerging challenge.A dataset of sexist harassment on Twitter,containing tweets about the harassment of people on a sexual basis,for natural language processing(NLP),is used for this purpose.Two algorithms are used to transform the text into a meaningful representation of numbers for machine learning(ML)input:Term frequency inverse document frequency(TF-IDF)and Bidirectional encoder representations from transformers(BERT).The well-known eXtreme gradient boosting(XGBoost)ML model is employed to classify whether certain tweets fall into the category of sexual-based harassment or not.Additionally,with the goal of reaching better performance,several XGBoost models were devised conducting hyperparameter tuning by metaheuristics.For this purpose,the recently emerging Coyote optimization algorithm(COA)was modified and adjusted to optimize the XGBoost model.Additionally,other cutting-edge metaheuristics approach for this challenge were also implemented,and rigid comparative analysis of the captured classification metrics(accuracy,Cohen kappa score,precision,recall,and F1-score)was performed.Finally,the best-generated model was interpreted by Shapley additive explanations(SHAP),and useful insights were gained about the behavioral patterns of people who perform social harassment.