Sentiment analysis is becoming increasingly important in today’s digital age, with social media being a significantsource of user-generated content. The development of sentiment lexicons that can support languages ot...Sentiment analysis is becoming increasingly important in today’s digital age, with social media being a significantsource of user-generated content. The development of sentiment lexicons that can support languages other thanEnglish is a challenging task, especially for analyzing sentiment analysis in social media reviews. Most existingsentiment analysis systems focus on English, leaving a significant research gap in other languages due to limitedresources and tools. This research aims to address this gap by building a sentiment lexicon for local languages,which is then used with a machine learning algorithm for efficient sentiment analysis. In the first step, a lexiconis developed that includes five languages: Urdu, Roman Urdu, Pashto, Roman Pashto, and English. The sentimentscores from SentiWordNet are associated with each word in the lexicon to produce an effective sentiment score. Inthe second step, a naive Bayesian algorithm is applied to the developed lexicon for efficient sentiment analysis ofRoman Pashto. Both the sentiment lexicon and sentiment analysis steps were evaluated using information retrievalmetrics, with an accuracy score of 0.89 for the sentiment lexicon and 0.83 for the sentiment analysis. The resultsshowcase the potential for improving software engineering tasks related to user feedback analysis and productdevelopment.展开更多
基金Researchers supporting Project Number(RSPD2024R576),King Saud University,Riyadh,Saudi Arabia.
文摘Sentiment analysis is becoming increasingly important in today’s digital age, with social media being a significantsource of user-generated content. The development of sentiment lexicons that can support languages other thanEnglish is a challenging task, especially for analyzing sentiment analysis in social media reviews. Most existingsentiment analysis systems focus on English, leaving a significant research gap in other languages due to limitedresources and tools. This research aims to address this gap by building a sentiment lexicon for local languages,which is then used with a machine learning algorithm for efficient sentiment analysis. In the first step, a lexiconis developed that includes five languages: Urdu, Roman Urdu, Pashto, Roman Pashto, and English. The sentimentscores from SentiWordNet are associated with each word in the lexicon to produce an effective sentiment score. Inthe second step, a naive Bayesian algorithm is applied to the developed lexicon for efficient sentiment analysis ofRoman Pashto. Both the sentiment lexicon and sentiment analysis steps were evaluated using information retrievalmetrics, with an accuracy score of 0.89 for the sentiment lexicon and 0.83 for the sentiment analysis. The resultsshowcase the potential for improving software engineering tasks related to user feedback analysis and productdevelopment.