With the rapid growth of web-based social networking technologies in recent years, author identification and analysis have proven increasingly useful. Authorship analysis provides information about a document’s autho...With the rapid growth of web-based social networking technologies in recent years, author identification and analysis have proven increasingly useful. Authorship analysis provides information about a document’s author, often including the author’s gender. Men and women are known to write in distinctly different ways, and these differences can be successfully used to make a gender prediction. Making use of these distinctions between male and female authors, this study demonstrates the use of a simple stream-based neural network to automatically discriminate gender on manually labeled tweets from the Twitter social network. This neural network, the Modified Balanced Winnow, was employed in two ways;the effectiveness of data stream mining was initially examined with an extensive list of n-gram features. Feature selection techniques were then evaluated by drastically reducing the feature list using WEKA’s attribute selection algorithms. This study demonstrates the effectiveness of the stream mining approach, achieving an accuracy of 82.48%, a 20.81% increase above the baseline prediction. Using feature selection methods improved the results by an additional 16.03%, to an accuracy of 98.51%.展开更多
Gender analysis of Twitter could reveal significant socio-cultural differ-ences between female and male users.Efforts had been made to analyze and auto-matically infer gender formerly for more commonly spoken language...Gender analysis of Twitter could reveal significant socio-cultural differ-ences between female and male users.Efforts had been made to analyze and auto-matically infer gender formerly for more commonly spoken languages’content,but,as we now know that limited work is being undertaken for Arabic.Most of the research works are done mainly for English and least amount of effort for non-English language.The study for Arabic demographic inference like gen-der is relatively uncommon for social networking users,especially for Twitter.Therefore,this study aims to design an optimal marginalized stacked denoising autoencoder for gender identification on Arabic Twitter(OMSDAE-GIAT)model.The presented OMSDAE-GIAR technique mainly concentrates on the identifica-tion and classification of gender exist in the Twitter data.To attain this,the OMS-DAE-GIAT model derives initial stages of data pre-processing and word embedding.Next,the MSDAE model is exploited for the identification of gender into two classes namely male and female.In the final stage,the OMSDAE-GIAT technique uses enhanced bat optimization algorithm(EBOA)for parameter tuning process,showing the novelty of our work.The performance validation of the OMSDAE-GIAT model is inspected against an Arabic corpus dataset and the results are measured under distinct metrics.The comparison study reported the enhanced performance of the OMSDAE-GIAT model over other recent approaches.展开更多
The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts...The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.展开更多
文摘With the rapid growth of web-based social networking technologies in recent years, author identification and analysis have proven increasingly useful. Authorship analysis provides information about a document’s author, often including the author’s gender. Men and women are known to write in distinctly different ways, and these differences can be successfully used to make a gender prediction. Making use of these distinctions between male and female authors, this study demonstrates the use of a simple stream-based neural network to automatically discriminate gender on manually labeled tweets from the Twitter social network. This neural network, the Modified Balanced Winnow, was employed in two ways;the effectiveness of data stream mining was initially examined with an extensive list of n-gram features. Feature selection techniques were then evaluated by drastically reducing the feature list using WEKA’s attribute selection algorithms. This study demonstrates the effectiveness of the stream mining approach, achieving an accuracy of 82.48%, a 20.81% increase above the baseline prediction. Using feature selection methods improved the results by an additional 16.03%, to an accuracy of 98.51%.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi ArabiaThe authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4310373DSR55.
文摘Gender analysis of Twitter could reveal significant socio-cultural differ-ences between female and male users.Efforts had been made to analyze and auto-matically infer gender formerly for more commonly spoken languages’content,but,as we now know that limited work is being undertaken for Arabic.Most of the research works are done mainly for English and least amount of effort for non-English language.The study for Arabic demographic inference like gen-der is relatively uncommon for social networking users,especially for Twitter.Therefore,this study aims to design an optimal marginalized stacked denoising autoencoder for gender identification on Arabic Twitter(OMSDAE-GIAT)model.The presented OMSDAE-GIAR technique mainly concentrates on the identifica-tion and classification of gender exist in the Twitter data.To attain this,the OMS-DAE-GIAT model derives initial stages of data pre-processing and word embedding.Next,the MSDAE model is exploited for the identification of gender into two classes namely male and female.In the final stage,the OMSDAE-GIAT technique uses enhanced bat optimization algorithm(EBOA)for parameter tuning process,showing the novelty of our work.The performance validation of the OMSDAE-GIAT model is inspected against an Arabic corpus dataset and the results are measured under distinct metrics.The comparison study reported the enhanced performance of the OMSDAE-GIAT model over other recent approaches.
文摘The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.