When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People ...When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People communicate with others on social media through messages and comments. So bullies use social media as a rich environment to bully others, especially on political issues. Fights over Cyberbullying on political and social media posts are common today. Most of the time, it does a lot of damage. However, few works have been done for monitoring Bangla text on social media & no work has been done yet for detecting the bullying Bangla text on political issues due to the lack of annotated corpora and morphologic analyzers. In this work, we used several machine learning classifiers & a model. That will help to detect the Bangla bullying texts on social media. For this work, 11,000 Bangla texts have been collected from the comments section of political Facebook posts to make a new dataset and labelled the data as either bullied or not. This dataset has been used to train the machine learning classifier. The results indicate that Random Forest achieves superior accuracy of 91.08%.展开更多
The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the comm...The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.展开更多
The increasing prevalence of technology in society has an impact on young people’s language use and development. Greeklish is the writing of Greek texts using the Latin instead of the Greek alphabet, a practice known...The increasing prevalence of technology in society has an impact on young people’s language use and development. Greeklish is the writing of Greek texts using the Latin instead of the Greek alphabet, a practice known as Latinization, also employed for many non-latin alphabet languages. The primary aim of this research is to evaluate the effect of Greeklish on reading time. A sample of 732 young Greeks were asked about their habits when communicating through e-mail and social media with their friends and they then participated in an experiment in which they were asked to read and understand two short texts, one written in Greek and the other in Greeklish. The findings of the research show that nearly one third of the participants use Greeklish. The results of the experiment conducted reveal that understanding is not affected by the alphabet used but reading Greeklish is significantly more time consuming than reading Greek independently of the sex and the familiarity of the participants with Greeklish. The findings suggest that amending social and communication media with software utilities related to Latinization such as language identifiers and converters may reduce reading time and thus facilitate written communication among the users.展开更多
文摘When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People communicate with others on social media through messages and comments. So bullies use social media as a rich environment to bully others, especially on political issues. Fights over Cyberbullying on political and social media posts are common today. Most of the time, it does a lot of damage. However, few works have been done for monitoring Bangla text on social media & no work has been done yet for detecting the bullying Bangla text on political issues due to the lack of annotated corpora and morphologic analyzers. In this work, we used several machine learning classifiers & a model. That will help to detect the Bangla bullying texts on social media. For this work, 11,000 Bangla texts have been collected from the comments section of political Facebook posts to make a new dataset and labelled the data as either bullied or not. This dataset has been used to train the machine learning classifier. The results indicate that Random Forest achieves superior accuracy of 91.08%.
文摘The development of various applications based on social network text is in full swing.Studying text features and classifications is of great value to extract important information.This paper mainly introduces the common feature selection algorithms and feature representation methods,and introduces the basic principles,advantages and disadvantages of SVM and KNN,and the evaluation indexes of classification algorithms.In the aspect of mutual information feature selection function,it describes its processing flow,shortcomings and optimization improvements.In view of its weakness in not balancing the positive and negative correlation characteristics,a balance weight attribute factor and feature difference factor are introduced to make up for its deficiency.The experimental stage mainly describes the specific process:the word segmentation processing,to disuse words,using various feature selection algorithms,including optimized mutual information,and weighted with TF-IDF.Under the two classification algorithms of SVM and KNN,we compare the merits and demerits of all the feature selection algorithms according to the evaluation index.Experiments show that the optimized mutual information feature selection has good performance and is better than KNN under the SVM classification algorithm.This experiment proves its validity.
文摘The increasing prevalence of technology in society has an impact on young people’s language use and development. Greeklish is the writing of Greek texts using the Latin instead of the Greek alphabet, a practice known as Latinization, also employed for many non-latin alphabet languages. The primary aim of this research is to evaluate the effect of Greeklish on reading time. A sample of 732 young Greeks were asked about their habits when communicating through e-mail and social media with their friends and they then participated in an experiment in which they were asked to read and understand two short texts, one written in Greek and the other in Greeklish. The findings of the research show that nearly one third of the participants use Greeklish. The results of the experiment conducted reveal that understanding is not affected by the alphabet used but reading Greeklish is significantly more time consuming than reading Greek independently of the sex and the familiarity of the participants with Greeklish. The findings suggest that amending social and communication media with software utilities related to Latinization such as language identifiers and converters may reduce reading time and thus facilitate written communication among the users.