This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingl...This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingly.The study describes the characteristics of the Arabic language,different types of OCR systems,different stages of the Arabic OCR system,the researcher’s contributions in each step,and the evaluationmetrics for OCR.The study reviews the existing datasets for the Arabic OCR and their characteristics.Additionally,this study implemented some preprocessing and segmentation stages of Arabic OCR.The study compares the performance of the existing methods in terms of recognition accuracy.In addition to researchers’OCRmethods,commercial and open-source systems are used in the comparison.The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters.Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment,and few approaches were tested on pagelevel scripts.Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low,under 75%for printed text,and further improvement is needed.Moreover,most of the current approaches are offline OCR systems,and there is no remarkable contribution to online OCR systems.展开更多
Recognizing handwritten characters remains a critical and formidable challenge within the realm of computervision. Although considerable strides have been made in enhancing English handwritten character recognitionthr...Recognizing handwritten characters remains a critical and formidable challenge within the realm of computervision. Although considerable strides have been made in enhancing English handwritten character recognitionthrough various techniques, deciphering Arabic handwritten characters is particularly intricate. This complexityarises from the diverse array of writing styles among individuals, coupled with the various shapes that a singlecharacter can take when positioned differently within document images, rendering the task more perplexing. Inthis study, a novel segmentation method for Arabic handwritten scripts is suggested. This work aims to locatethe local minima of the vertical and diagonal word image densities to precisely identify the segmentation pointsbetween the cursive letters. The proposed method starts with pre-processing the word image without affectingits main features, then calculates the directions pixel density of the word image by scanning it vertically and fromangles 30° to 90° to count the pixel density fromall directions and address the problem of overlapping letters, whichis a commonly attitude in writing Arabic texts by many people. Local minima and thresholds are also determinedto identify the ideal segmentation area. The proposed technique is tested on samples obtained fromtwo datasets: Aself-curated image dataset and the IFN/ENIT dataset. The results demonstrate that the proposed method achievesa significant improvement in the proportions of cursive segmentation of 92.96% on our dataset, as well as 89.37%on the IFN/ENIT dataset.展开更多
With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,l...With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.展开更多
Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.T...Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.The Arabic language includes 28 characters.Each character has up to four shapes according to its location in the word(at the beginning,middle,end,and isolated).This paper proposed 12 CNN architectures for recognizing handwritten Arabic characters.The proposed architectures were derived from the popular CNN architectures,such as VGG,ResNet,and Inception,to make them applicable to recognizing character-size images.The experimental results on three well-known datasets showed that the proposed architectures significantly enhanced the recognition rate compared to the baseline models.The experiments showed that data augmentation improved the models’accuracies on all tested datasets.The proposed model outperformed most of the existing approaches.The best achieved results were 93.05%,98.30%,and 96.88%on the HIJJA,AHCD,and AIA9K datasets.展开更多
Handwritten character recognition becomes one of the challenging research matters.More studies were presented for recognizing letters of various languages.The availability of Arabic handwritten characters databases wa...Handwritten character recognition becomes one of the challenging research matters.More studies were presented for recognizing letters of various languages.The availability of Arabic handwritten characters databases was confined.Almost a quarter of a billion people worldwide write and speak Arabic.More historical books and files indicate a vital data set for many Arab nationswritten in Arabic.Recently,Arabic handwritten character recognition(AHCR)has grabbed the attention and has become a difficult topic for pattern recognition and computer vision(CV).Therefore,this study develops fireworks optimizationwith the deep learning-based AHCR(FWODL-AHCR)technique.Themajor intention of the FWODL-AHCR technique is to recognize the distinct handwritten characters in the Arabic language.It initially pre-processes the handwritten images to improve their quality of them.Then,the RetinaNet-based deep convolutional neural network is applied as a feature extractor to produce feature vectors.Next,the deep echo state network(DESN)model is utilized to classify handwritten characters.Finally,the FWO algorithm is exploited as a hyperparameter tuning strategy to boost recognition performance.Various simulations in series were performed to exhibit the enhanced performance of the FWODL-AHCR technique.The comparison study portrayed the supremacy of the FWODL-AHCR technique over other approaches,with 99.91%and 98.94%on Hijja and AHCD datasets,respectively.展开更多
Spices are defined as any aromatic condiment of plant origin used to alter the flavor and aroma of foods. Besides flavor and aroma, many spices have antioxidant activity, mainly related to the presence in cloves of ph...Spices are defined as any aromatic condiment of plant origin used to alter the flavor and aroma of foods. Besides flavor and aroma, many spices have antioxidant activity, mainly related to the presence in cloves of phenolic compounds, such as flavonoids, terpenoids and eugenol. In turn, the most common uses of gum arabic are in the form of powder for addition to soft drink syrups, cuisine and baked goods, specifically to stabilize the texture of products, increase the viscosity of liquids and promote the leavening of baked products (e.g., cakes). Both eugenol, extracted from cloves, and gum arabic, extracted from the hardened sap of two species of the Acacia tree, are dietary constituents routinely consumed virtually throughout the world. Both of them are also widely used medicinally to inhibit oxidative stress and genotoxicity. The prevention arm of the study included groups: Ia, IIa, IIIa, Iva, V, VI, VII, VIII. Once a week for 20 weeks, the controls received saline s.c. while the experimental groups received DMH at 20 mg/kg s.c. During the same period and for an additional 9 weeks, the animals received either water, 10% GA, EUG, or 10% GA + EUG by gavage. The treatment arm of the study included groups Ib, IIb, IIIb e IVb, IX, X, XI, XII). Once a week for 20 weeks, the controls received saline s.c. while the experimental groups received DMH at 20 mg/kg s.c. During the subsequent 9 weeks, the animals received either water, 10% GA, EUG or 10% GA + EUG by gavage. The novelty of this study is the investigation of their use alone and together for the prevention and treatment of experimental colorectal carcinogenesis induced by dimethylhydrazine. Our results show that the combined use of 10% gum arabic and eugenol was effective, with antioxidant action in the colon, as well as reducing oxidative stress in all colon segments and preventing and treating genotoxicity in all colon segments. Furthermore, their joint administration reduced the number of aberrant crypts and the number of aberrant crypt foci (ACF) in the distal segment and entire colon, as well as the number of ACF with at least 5 crypts in the entire colon. Thus, our results also demonstrate the synergistic effects of 10% gum arabic together with eugenol (from cloves), with antioxidant, antigenotoxic and anticarcinogenic actions (prevention and treatment) at the doses and durations studied, in the colon of rats submitted to colorectal carcinogenesis induced by dimethylhydrazine.展开更多
Gum Arabic (GA) from Acacia senegal var. kerensis has been approved as an emulsifier, stabilizer, thickener, and encapsulator in food processing industry. Chia mucilage, on the other hand, has been approved to be used...Gum Arabic (GA) from Acacia senegal var. kerensis has been approved as an emulsifier, stabilizer, thickener, and encapsulator in food processing industry. Chia mucilage, on the other hand, has been approved to be used as a fat and egg yolk mimic. However, both chia mucilage and gum Arabic are underutilized locally in Kenya;thus, marginal reports have been published despite their potential to alter functional properties in food products. In this study, the potential use of chia mucilage and gum Arabic was evaluated in the development of an eggless fat-reduced mayonnaise (FRM). The mayonnaise substitute was prepared by replacing eggs and partially substituting sunflower oil with chia mucilage at 15%, 30%, 45%, and 60% levels and gum Arabic at 3% while reducing the oil levels to 15%, 30%, 45%, and 60%. The effect of different concentrations of oil and chia mucilage on the physicochemical properties, for example, pH, emulsion stability, moisture content, protein, carbohydrate, fats, calories, ash, and titratable acidity using AOAC methods and sensory properties for both consumer acceptability and quantitative descriptive analysis of mayonnaise were evaluated and compared to the control with eggs and 75% sunflower oil. The results indicated that all fat-reduced mayonnaises had significantly lower energy to 493 kcal/100g and 20% fat content but higher water content of 0.74 than the control with 784 Kcal/100g calories, 77% fat and 0.39 moisture. These differences increased with increasing substitution levels of chia mucilage, as impacted on pH, carbohydrate, and protein. There was no significant difference between ash content for both fat-reduced mayonnaise and control. Sensory evaluation demonstrated that mayonnaises substituted with chia seeds mucilage and gum Arabic were accepted. All the parameters are positively correlated to overall acceptability, with flavor having the strongest correlation of r = 0.78. Loadings from principal component analysis (PCA) of 16 sensory attributes of mayonnaise showed that approximately over 66% of the variations in sensory attributes were explained by the first six principal components. This study shows good potential for chia mucilage and gum Arabic to be used as fat and egg mimetics and stabilizers, respectively, in mayonnaise with functional properties.展开更多
Dough improvers are substances with functional characteristics used in baking industry to enhance dough properties. Currently, the baking industry is faced with increasing demand for natural ingredients owing to incre...Dough improvers are substances with functional characteristics used in baking industry to enhance dough properties. Currently, the baking industry is faced with increasing demand for natural ingredients owing to increasing consumer awareness, thus contributing to the rising demand for natural hydrocolloids. Gum Arabic from Acacia senegal var. kerensis is a natural gum exhibiting excellent water binding and emulsification capacity. However, very little is reported on how it affects the rheological properties of wheat dough. The aim of this study was therefore, to determine the rheological properties of wheat dough with partial additions of gum Arabic as an improver. Six treatments were analyzed comprising of: flour-gum blends prepared by adding gum Arabic to wheat flour at different levels (1%, 2% and 3%), plain wheat flour (negative control), commercial bread flour and commercial chapati flour (positive controls). The rheological properties were determined using Brabender Farinograph, Brabender Extensograph and Brabender Viscograph. Results showed that addition of gum Arabic significantly (p chapati. These findings support the need to utilize gum Arabic from Acacia senegal var. kerensis as a dough improver.展开更多
In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that op...In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.展开更多
The COVID-19 pandemic caused significant disruptions in the field of education worldwide,including in the United Arab Emirates.Teachers and students had to adapt to remote learning and virtual classrooms,leading to va...The COVID-19 pandemic caused significant disruptions in the field of education worldwide,including in the United Arab Emirates.Teachers and students had to adapt to remote learning and virtual classrooms,leading to various challenges in maintaining educational standards.The sudden transition to remote teaching could have a negative impact on students’reading abilities,especially in the Arabic language.To gain insight into the unique challenges encountered by Arabic language teachers in the UAE,a survey was conducted to explore their assessment of teaching quality,student-teacher interaction,and learning outcomes amidst the COVID-19 pandemic.The results of the survey revealed a significant decline of student reading abilities and identified several major issues in online Arabic language teaching.These issues included limited interaction between students and teachers,challenges in monitoring students’class participation and performance,and challenges in effectively assessing students’reading skills.The results also demonstrated some other challenges faced by Arabic language teachers,including a lack of preparedness,a lack of subscription to relevant platforms,and a lack of resources for online learning.Several solutions to these challenges are proposed,including reevaluating the balance between depth and breadth in the curriculum,integrating language skills into the curriculum more effectively,providing more comprehensive teacher professional development,implementing student grouping strategies,utilizing retired and expert teachers in specific content areas,allocating time for interventions,and improving support from both teachers and parents to ensure the quality of online learning.展开更多
Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.Howeve...Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.However,most existing works on Arabic ABSA content separately address them,assume that aspect terms are preidentified,or use a pipeline model.Pipeline solutions design different models for each task,and the output from the ATE model is used as the input to the APC model,which may result in error propagation among different steps because APC is affected by ATE error.These methods are impractical for real-world scenarios where the ATE task is the base task for APC,and its result impacts the accuracy of APC.Thus,in this study,we focused on a multi-task learning model for Arabic ATE and APC in which the model is jointly trained on two subtasks simultaneously in a singlemodel.This paper integrates themulti-task model,namely Local Cotext Foucse-Aspect Term Extraction and Polarity classification(LCF-ATEPC)and Arabic Bidirectional Encoder Representation from Transformers(AraBERT)as a shred layer for Arabic contextual text representation.The LCF-ATEPC model is based on a multi-head selfattention and local context focus mechanism(LCF)to capture the interactive information between an aspect and its context.Moreover,data augmentation techniques are proposed based on state-of-the-art augmentation techniques(word embedding substitution with constraints and contextual embedding(AraBERT))to increase the diversity of the training dataset.This paper examined the effect of data augmentation on the multi-task model for Arabic ABSA.Extensive experiments were conducted on the original and combined datasets(merging the original and augmented datasets).Experimental results demonstrate that the proposed Multi-task model outperformed existing APC techniques.Superior results were obtained by AraBERT and LCF-ATEPC with fusion layer(AR-LCF-ATEPC-Fusion)and the proposed data augmentation word embedding-based method(FastText)on the combined dataset.展开更多
Aspect-Based Sentiment Analysis(ABSA)on Arabic corpus has become an active research topic in recent days.ABSA refers to a fine-grained Sentiment Analysis(SA)task that focuses on the extraction of the conferred aspects...Aspect-Based Sentiment Analysis(ABSA)on Arabic corpus has become an active research topic in recent days.ABSA refers to a fine-grained Sentiment Analysis(SA)task that focuses on the extraction of the conferred aspects and the identification of respective sentiment polarity from the provided text.Most of the prevailing Arabic ABSA techniques heavily depend upon dreary feature-engineering and pre-processing tasks and utilize external sources such as lexicons.In literature,concerning the Arabic language text analysis,the authors made use of regular Machine Learning(ML)techniques that rely on a group of rare sources and tools.These sources were used for processing and analyzing the Arabic language content like lexicons.However,an important challenge in this domain is the unavailability of sufficient and reliable resources.In this background,the current study introduces a new Battle Royale Optimization with Fuzzy Deep Learning for Arabic Aspect Based Sentiment Classification(BROFDL-AASC)technique.The aim of the presented BROFDL-AASC model is to detect and classify the sentiments in the Arabic language.In the presented BROFDL-AASC model,data pre-processing is performed at first to convert the input data into a useful format.Besides,the BROFDL-AASC model includes Discriminative Fuzzy-based Restricted Boltzmann Machine(DFRBM)model for the identification and categorization of sentiments.Furthermore,the BRO algorithm is exploited for optimal fine-tuning of the hyperparameters related to the FBRBM model.This scenario establishes the novelty of current study.The performance of the proposed BROFDL-AASC model was validated and the outcomes demonstrate the supremacy of BROFDL-AASC model over other existing models.展开更多
Sentiment analysis(SA)of the Arabic language becomes important despite scarce annotated corpora and confined sources.Arabic affect Analysis has become an active research zone nowadays.But still,the Arabic language lag...Sentiment analysis(SA)of the Arabic language becomes important despite scarce annotated corpora and confined sources.Arabic affect Analysis has become an active research zone nowadays.But still,the Arabic language lags behind adequate language sources for enabling the SA tasks.Thus,Arabic still faces challenges in natural language processing(NLP)tasks because of its structure complexities,history,and distinct cultures.It has gained lesser effort than the other languages.This paper developed a Multi-versus Optimization with Deep Reinforcement Learning Enabled Affect Analysis(MVODRL-AA)on Arabic Corpus.The presented MVODRL-AAmodelmajorly concentrates on identifying and classifying effects or emotions that occurred in the Arabic corpus.Firstly,the MVODRL-AA model follows data pre-processing and word embedding.Next,an n-gram model is utilized to generate word embeddings.A deep Q-learning network(DQLN)model is then exploited to identify and classify the effect on the Arabic corpus.At last,the MVO algorithm is used as a hyperparameter tuning approach to adjust the hyperparameters related to the DQLN model,showing the novelty of the work.A series of simulations were carried out to exhibit the promising performance of the MVODRL-AA model.The simulation outcomes illustrate the betterment of the MVODRL-AA method over the other approaches with an accuracy of 99.27%.展开更多
Nowadays,the usage of socialmedia platforms is rapidly increasing,and rumours or false information are also rising,especially among Arab nations.This false information is harmful to society and individuals.Blocking an...Nowadays,the usage of socialmedia platforms is rapidly increasing,and rumours or false information are also rising,especially among Arab nations.This false information is harmful to society and individuals.Blocking and detecting the spread of fake news in Arabic becomes critical.Several artificial intelligence(AI)methods,including contemporary transformer techniques,BERT,were used to detect fake news.Thus,fake news in Arabic is identified by utilizing AI approaches.This article develops a new hunterprey optimization with hybrid deep learning-based fake news detection(HPOHDL-FND)model on the Arabic corpus.The HPOHDL-FND technique undergoes extensive data pre-processing steps to transform the input data into a useful format.Besides,the HPOHDL-FND technique utilizes long-term memory with a recurrent neural network(LSTM-RNN)model for fake news detection and classification.Finally,hunter prey optimization(HPO)algorithm is exploited for optimal modification of the hyperparameters related to the LSTM-RNN model.The performance validation of the HPOHDL-FND technique is tested using two Arabic datasets.The outcomes exemplified better performance over the other existing techniques with maximum accuracy of 96.57%and 93.53%on Covid19Fakes and satirical datasets,respectively.展开更多
Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli...Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).展开更多
Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic langu...Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic language makes the problem even more complicated.This study designs,implements,and evaluates an automatic Arabic essay scoring system.The proposed system starts with pre-processing the student answer and model answer dataset using data cleaning and natural language processing tasks.Then,it comprises two main components:the grading engine and the adaptive fusion engine.The grading engine employs string-based and corpus-based similarity algorithms separately.After that,the adaptive fusion engine aims to prepare students’scores to be delivered to different feature selection algorithms,such as Recursive Feature Elimination and Boruta.Then,some machine learning algorithms such as Decision Tree,Random Forest,Adaboost,Lasso,Bagging,and K-Nearest Neighbor are employed to improve the suggested system’s efficiency.The experimental results in the grading engine showed that Extracting DIStributionally similar words using the CO-occurrences similarity measure achieved the best correlation values.Furthermore,in the adaptive fusion engine,the Random Forest algorithm outperforms all other machine learning algorithms using the(80%–20%)splitting method on the original dataset.It achieves 91.30%,94.20%,0.023,0.106,and 0.153 in terms of Pearson’s Correlation Coefficient,Willmot’s Index of Agreement,Mean Square Error,Mean Absolute Error,and Root Mean Square Error metrics,respectively.展开更多
The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media r...The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media regulatory authorities.In this paper,we focus on the automatic Arabic Ticker Recognition system for the Al-Ekhbariya news channel.The primary emphasis of this research is on ticker recognition methods and storage schemes.To that end,the research is aimed at character-wise explicit segmentation using a semantic segmentation technique and words identification method.The proposed learning architecture considers the grouping of homogeneousshaped classes.This incorporates linguistic taxonomy in a unified manner to address the imbalance in data distribution which leads to individual biases.Furthermore,experiments with a novel ArabicNews Ticker(Al-ENT)dataset that provides accurate character-level and character components-level labeling to evaluate the effectiveness of the suggested approach.The proposed method attains 96.5%,outperforming the current state-of-the-art technique by 8.5%.The study reveals that our strategy improves the performance of lowrepresentation correlated character classes.展开更多
The recognition of the Arabic characters is a crucial task incomputer vision and Natural Language Processing fields. Some major complicationsin recognizing handwritten texts include distortion and patternvariabilities...The recognition of the Arabic characters is a crucial task incomputer vision and Natural Language Processing fields. Some major complicationsin recognizing handwritten texts include distortion and patternvariabilities. So, the feature extraction process is a significant task in NLPmodels. If the features are automatically selected, it might result in theunavailability of adequate data for accurately forecasting the character classes.But, many features usually create difficulties due to high dimensionality issues.Against this background, the current study develops a Sailfish Optimizer withDeep Transfer Learning-Enabled Arabic Handwriting Character Recognition(SFODTL-AHCR) model. The projected SFODTL-AHCR model primarilyfocuses on identifying the handwritten Arabic characters in the inputimage. The proposed SFODTL-AHCR model pre-processes the input imageby following the Histogram Equalization approach to attain this objective.The Inception with ResNet-v2 model examines the pre-processed image toproduce the feature vectors. The Deep Wavelet Neural Network (DWNN)model is utilized to recognize the handwritten Arabic characters. At last,the SFO algorithm is utilized for fine-tuning the parameters involved in theDWNNmodel to attain better performance. The performance of the proposedSFODTL-AHCR model was validated using a series of images. Extensivecomparative analyses were conducted. The proposed method achieved a maximum accuracy of 99.73%. The outcomes inferred the supremacy of theproposed SFODTL-AHCR model over other approaches.展开更多
Text classification or categorization is the procedure of automatically tagging a textual document with most related labels or classes.When the number of labels is limited to one,the task becomes single-label text cat...Text classification or categorization is the procedure of automatically tagging a textual document with most related labels or classes.When the number of labels is limited to one,the task becomes single-label text categorization.The Arabic texts include unstructured information also like English texts,and that is understandable for machine learning(ML)techniques,the text is changed and demonstrated by numerical value.In recent times,the dominant method for natural language processing(NLP)tasks is recurrent neural network(RNN),in general,long short termmemory(LSTM)and convolutional neural network(CNN).Deep learning(DL)models are currently presented for deriving a massive amount of text deep features to an optimum performance from distinct domains such as text detection,medical image analysis,and so on.This paper introduces aModified Dragonfly Optimization with Extreme Learning Machine for Text Representation and Recognition(MDFO-EMTRR)model onArabicCorpus.The presentedMDFO-EMTRR technique mainly concentrates on the recognition and classification of the Arabic text.To achieve this,theMDFO-EMTRRtechnique encompasses data pre-processing to transform the input data into compatible format.Next,the ELM model is utilized for the representation and recognition of the Arabic text.At last,the MDFO algorithm was exploited for optimal tuning of the parameters related to the ELM method and thereby accomplish enhanced classifier results.The experimental result analysis of the MDFO-EMTRR system was performed on benchmark datasets and attained maximum accuracy of 99.74%.展开更多
Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier u...Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model.展开更多
文摘This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingly.The study describes the characteristics of the Arabic language,different types of OCR systems,different stages of the Arabic OCR system,the researcher’s contributions in each step,and the evaluationmetrics for OCR.The study reviews the existing datasets for the Arabic OCR and their characteristics.Additionally,this study implemented some preprocessing and segmentation stages of Arabic OCR.The study compares the performance of the existing methods in terms of recognition accuracy.In addition to researchers’OCRmethods,commercial and open-source systems are used in the comparison.The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters.Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment,and few approaches were tested on pagelevel scripts.Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low,under 75%for printed text,and further improvement is needed.Moreover,most of the current approaches are offline OCR systems,and there is no remarkable contribution to online OCR systems.
文摘Recognizing handwritten characters remains a critical and formidable challenge within the realm of computervision. Although considerable strides have been made in enhancing English handwritten character recognitionthrough various techniques, deciphering Arabic handwritten characters is particularly intricate. This complexityarises from the diverse array of writing styles among individuals, coupled with the various shapes that a singlecharacter can take when positioned differently within document images, rendering the task more perplexing. Inthis study, a novel segmentation method for Arabic handwritten scripts is suggested. This work aims to locatethe local minima of the vertical and diagonal word image densities to precisely identify the segmentation pointsbetween the cursive letters. The proposed method starts with pre-processing the word image without affectingits main features, then calculates the directions pixel density of the word image by scanning it vertically and fromangles 30° to 90° to count the pixel density fromall directions and address the problem of overlapping letters, whichis a commonly attitude in writing Arabic texts by many people. Local minima and thresholds are also determinedto identify the ideal segmentation area. The proposed technique is tested on samples obtained fromtwo datasets: Aself-curated image dataset and the IFN/ENIT dataset. The results demonstrate that the proposed method achievesa significant improvement in the proportions of cursive segmentation of 92.96% on our dataset, as well as 89.37%on the IFN/ENIT dataset.
文摘With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.
文摘Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.The Arabic language includes 28 characters.Each character has up to four shapes according to its location in the word(at the beginning,middle,end,and isolated).This paper proposed 12 CNN architectures for recognizing handwritten Arabic characters.The proposed architectures were derived from the popular CNN architectures,such as VGG,ResNet,and Inception,to make them applicable to recognizing character-size images.The experimental results on three well-known datasets showed that the proposed architectures significantly enhanced the recognition rate compared to the baseline models.The experiments showed that data augmentation improved the models’accuracies on all tested datasets.The proposed model outperformed most of the existing approaches.The best achieved results were 93.05%,98.30%,and 96.88%on the HIJJA,AHCD,and AIA9K datasets.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabiathe Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR39.
文摘Handwritten character recognition becomes one of the challenging research matters.More studies were presented for recognizing letters of various languages.The availability of Arabic handwritten characters databases was confined.Almost a quarter of a billion people worldwide write and speak Arabic.More historical books and files indicate a vital data set for many Arab nationswritten in Arabic.Recently,Arabic handwritten character recognition(AHCR)has grabbed the attention and has become a difficult topic for pattern recognition and computer vision(CV).Therefore,this study develops fireworks optimizationwith the deep learning-based AHCR(FWODL-AHCR)technique.Themajor intention of the FWODL-AHCR technique is to recognize the distinct handwritten characters in the Arabic language.It initially pre-processes the handwritten images to improve their quality of them.Then,the RetinaNet-based deep convolutional neural network is applied as a feature extractor to produce feature vectors.Next,the deep echo state network(DESN)model is utilized to classify handwritten characters.Finally,the FWO algorithm is exploited as a hyperparameter tuning strategy to boost recognition performance.Various simulations in series were performed to exhibit the enhanced performance of the FWODL-AHCR technique.The comparison study portrayed the supremacy of the FWODL-AHCR technique over other approaches,with 99.91%and 98.94%on Hijja and AHCD datasets,respectively.
文摘Spices are defined as any aromatic condiment of plant origin used to alter the flavor and aroma of foods. Besides flavor and aroma, many spices have antioxidant activity, mainly related to the presence in cloves of phenolic compounds, such as flavonoids, terpenoids and eugenol. In turn, the most common uses of gum arabic are in the form of powder for addition to soft drink syrups, cuisine and baked goods, specifically to stabilize the texture of products, increase the viscosity of liquids and promote the leavening of baked products (e.g., cakes). Both eugenol, extracted from cloves, and gum arabic, extracted from the hardened sap of two species of the Acacia tree, are dietary constituents routinely consumed virtually throughout the world. Both of them are also widely used medicinally to inhibit oxidative stress and genotoxicity. The prevention arm of the study included groups: Ia, IIa, IIIa, Iva, V, VI, VII, VIII. Once a week for 20 weeks, the controls received saline s.c. while the experimental groups received DMH at 20 mg/kg s.c. During the same period and for an additional 9 weeks, the animals received either water, 10% GA, EUG, or 10% GA + EUG by gavage. The treatment arm of the study included groups Ib, IIb, IIIb e IVb, IX, X, XI, XII). Once a week for 20 weeks, the controls received saline s.c. while the experimental groups received DMH at 20 mg/kg s.c. During the subsequent 9 weeks, the animals received either water, 10% GA, EUG or 10% GA + EUG by gavage. The novelty of this study is the investigation of their use alone and together for the prevention and treatment of experimental colorectal carcinogenesis induced by dimethylhydrazine. Our results show that the combined use of 10% gum arabic and eugenol was effective, with antioxidant action in the colon, as well as reducing oxidative stress in all colon segments and preventing and treating genotoxicity in all colon segments. Furthermore, their joint administration reduced the number of aberrant crypts and the number of aberrant crypt foci (ACF) in the distal segment and entire colon, as well as the number of ACF with at least 5 crypts in the entire colon. Thus, our results also demonstrate the synergistic effects of 10% gum arabic together with eugenol (from cloves), with antioxidant, antigenotoxic and anticarcinogenic actions (prevention and treatment) at the doses and durations studied, in the colon of rats submitted to colorectal carcinogenesis induced by dimethylhydrazine.
文摘Gum Arabic (GA) from Acacia senegal var. kerensis has been approved as an emulsifier, stabilizer, thickener, and encapsulator in food processing industry. Chia mucilage, on the other hand, has been approved to be used as a fat and egg yolk mimic. However, both chia mucilage and gum Arabic are underutilized locally in Kenya;thus, marginal reports have been published despite their potential to alter functional properties in food products. In this study, the potential use of chia mucilage and gum Arabic was evaluated in the development of an eggless fat-reduced mayonnaise (FRM). The mayonnaise substitute was prepared by replacing eggs and partially substituting sunflower oil with chia mucilage at 15%, 30%, 45%, and 60% levels and gum Arabic at 3% while reducing the oil levels to 15%, 30%, 45%, and 60%. The effect of different concentrations of oil and chia mucilage on the physicochemical properties, for example, pH, emulsion stability, moisture content, protein, carbohydrate, fats, calories, ash, and titratable acidity using AOAC methods and sensory properties for both consumer acceptability and quantitative descriptive analysis of mayonnaise were evaluated and compared to the control with eggs and 75% sunflower oil. The results indicated that all fat-reduced mayonnaises had significantly lower energy to 493 kcal/100g and 20% fat content but higher water content of 0.74 than the control with 784 Kcal/100g calories, 77% fat and 0.39 moisture. These differences increased with increasing substitution levels of chia mucilage, as impacted on pH, carbohydrate, and protein. There was no significant difference between ash content for both fat-reduced mayonnaise and control. Sensory evaluation demonstrated that mayonnaises substituted with chia seeds mucilage and gum Arabic were accepted. All the parameters are positively correlated to overall acceptability, with flavor having the strongest correlation of r = 0.78. Loadings from principal component analysis (PCA) of 16 sensory attributes of mayonnaise showed that approximately over 66% of the variations in sensory attributes were explained by the first six principal components. This study shows good potential for chia mucilage and gum Arabic to be used as fat and egg mimetics and stabilizers, respectively, in mayonnaise with functional properties.
文摘Dough improvers are substances with functional characteristics used in baking industry to enhance dough properties. Currently, the baking industry is faced with increasing demand for natural ingredients owing to increasing consumer awareness, thus contributing to the rising demand for natural hydrocolloids. Gum Arabic from Acacia senegal var. kerensis is a natural gum exhibiting excellent water binding and emulsification capacity. However, very little is reported on how it affects the rheological properties of wheat dough. The aim of this study was therefore, to determine the rheological properties of wheat dough with partial additions of gum Arabic as an improver. Six treatments were analyzed comprising of: flour-gum blends prepared by adding gum Arabic to wheat flour at different levels (1%, 2% and 3%), plain wheat flour (negative control), commercial bread flour and commercial chapati flour (positive controls). The rheological properties were determined using Brabender Farinograph, Brabender Extensograph and Brabender Viscograph. Results showed that addition of gum Arabic significantly (p chapati. These findings support the need to utilize gum Arabic from Acacia senegal var. kerensis as a dough improver.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This study is supported via funding from Prince Sattam bin Abdulaziz University Project Number(PSAU/2024/R/1445).
文摘In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.
文摘The COVID-19 pandemic caused significant disruptions in the field of education worldwide,including in the United Arab Emirates.Teachers and students had to adapt to remote learning and virtual classrooms,leading to various challenges in maintaining educational standards.The sudden transition to remote teaching could have a negative impact on students’reading abilities,especially in the Arabic language.To gain insight into the unique challenges encountered by Arabic language teachers in the UAE,a survey was conducted to explore their assessment of teaching quality,student-teacher interaction,and learning outcomes amidst the COVID-19 pandemic.The results of the survey revealed a significant decline of student reading abilities and identified several major issues in online Arabic language teaching.These issues included limited interaction between students and teachers,challenges in monitoring students’class participation and performance,and challenges in effectively assessing students’reading skills.The results also demonstrated some other challenges faced by Arabic language teachers,including a lack of preparedness,a lack of subscription to relevant platforms,and a lack of resources for online learning.Several solutions to these challenges are proposed,including reevaluating the balance between depth and breadth in the curriculum,integrating language skills into the curriculum more effectively,providing more comprehensive teacher professional development,implementing student grouping strategies,utilizing retired and expert teachers in specific content areas,allocating time for interventions,and improving support from both teachers and parents to ensure the quality of online learning.
文摘Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.However,most existing works on Arabic ABSA content separately address them,assume that aspect terms are preidentified,or use a pipeline model.Pipeline solutions design different models for each task,and the output from the ATE model is used as the input to the APC model,which may result in error propagation among different steps because APC is affected by ATE error.These methods are impractical for real-world scenarios where the ATE task is the base task for APC,and its result impacts the accuracy of APC.Thus,in this study,we focused on a multi-task learning model for Arabic ATE and APC in which the model is jointly trained on two subtasks simultaneously in a singlemodel.This paper integrates themulti-task model,namely Local Cotext Foucse-Aspect Term Extraction and Polarity classification(LCF-ATEPC)and Arabic Bidirectional Encoder Representation from Transformers(AraBERT)as a shred layer for Arabic contextual text representation.The LCF-ATEPC model is based on a multi-head selfattention and local context focus mechanism(LCF)to capture the interactive information between an aspect and its context.Moreover,data augmentation techniques are proposed based on state-of-the-art augmentation techniques(word embedding substitution with constraints and contextual embedding(AraBERT))to increase the diversity of the training dataset.This paper examined the effect of data augmentation on the multi-task model for Arabic ABSA.Extensive experiments were conducted on the original and combined datasets(merging the original and augmented datasets).Experimental results demonstrate that the proposed Multi-task model outperformed existing APC techniques.Superior results were obtained by AraBERT and LCF-ATEPC with fusion layer(AR-LCF-ATEPC-Fusion)and the proposed data augmentation word embedding-based method(FastText)on the combined dataset.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R281)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR52。
文摘Aspect-Based Sentiment Analysis(ABSA)on Arabic corpus has become an active research topic in recent days.ABSA refers to a fine-grained Sentiment Analysis(SA)task that focuses on the extraction of the conferred aspects and the identification of respective sentiment polarity from the provided text.Most of the prevailing Arabic ABSA techniques heavily depend upon dreary feature-engineering and pre-processing tasks and utilize external sources such as lexicons.In literature,concerning the Arabic language text analysis,the authors made use of regular Machine Learning(ML)techniques that rely on a group of rare sources and tools.These sources were used for processing and analyzing the Arabic language content like lexicons.However,an important challenge in this domain is the unavailability of sufficient and reliable resources.In this background,the current study introduces a new Battle Royale Optimization with Fuzzy Deep Learning for Arabic Aspect Based Sentiment Classification(BROFDL-AASC)technique.The aim of the presented BROFDL-AASC model is to detect and classify the sentiments in the Arabic language.In the presented BROFDL-AASC model,data pre-processing is performed at first to convert the input data into a useful format.Besides,the BROFDL-AASC model includes Discriminative Fuzzy-based Restricted Boltzmann Machine(DFRBM)model for the identification and categorization of sentiments.Furthermore,the BRO algorithm is exploited for optimal fine-tuning of the hyperparameters related to the FBRBM model.This scenario establishes the novelty of current study.The performance of the proposed BROFDL-AASC model was validated and the outcomes demonstrate the supremacy of BROFDL-AASC model over other existing models.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Ara-bia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR38.
文摘Sentiment analysis(SA)of the Arabic language becomes important despite scarce annotated corpora and confined sources.Arabic affect Analysis has become an active research zone nowadays.But still,the Arabic language lags behind adequate language sources for enabling the SA tasks.Thus,Arabic still faces challenges in natural language processing(NLP)tasks because of its structure complexities,history,and distinct cultures.It has gained lesser effort than the other languages.This paper developed a Multi-versus Optimization with Deep Reinforcement Learning Enabled Affect Analysis(MVODRL-AA)on Arabic Corpus.The presented MVODRL-AAmodelmajorly concentrates on identifying and classifying effects or emotions that occurred in the Arabic corpus.Firstly,the MVODRL-AA model follows data pre-processing and word embedding.Next,an n-gram model is utilized to generate word embeddings.A deep Q-learning network(DQLN)model is then exploited to identify and classify the effect on the Arabic corpus.At last,the MVO algorithm is used as a hyperparameter tuning approach to adjust the hyperparameters related to the DQLN model,showing the novelty of the work.A series of simulations were carried out to exhibit the promising performance of the MVODRL-AA model.The simulation outcomes illustrate the betterment of the MVODRL-AA method over the other approaches with an accuracy of 99.27%.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Small Groups Project under Grant Number(120/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R281)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4331004DSR32).
文摘Nowadays,the usage of socialmedia platforms is rapidly increasing,and rumours or false information are also rising,especially among Arab nations.This false information is harmful to society and individuals.Blocking and detecting the spread of fake news in Arabic becomes critical.Several artificial intelligence(AI)methods,including contemporary transformer techniques,BERT,were used to detect fake news.Thus,fake news in Arabic is identified by utilizing AI approaches.This article develops a new hunterprey optimization with hybrid deep learning-based fake news detection(HPOHDL-FND)model on the Arabic corpus.The HPOHDL-FND technique undergoes extensive data pre-processing steps to transform the input data into a useful format.Besides,the HPOHDL-FND technique utilizes long-term memory with a recurrent neural network(LSTM-RNN)model for fake news detection and classification.Finally,hunter prey optimization(HPO)algorithm is exploited for optimal modification of the hyperparameters related to the LSTM-RNN model.The performance validation of the HPOHDL-FND technique is tested using two Arabic datasets.The outcomes exemplified better performance over the other existing techniques with maximum accuracy of 96.57%and 93.53%on Covid19Fakes and satirical datasets,respectively.
文摘Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).
文摘Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic language makes the problem even more complicated.This study designs,implements,and evaluates an automatic Arabic essay scoring system.The proposed system starts with pre-processing the student answer and model answer dataset using data cleaning and natural language processing tasks.Then,it comprises two main components:the grading engine and the adaptive fusion engine.The grading engine employs string-based and corpus-based similarity algorithms separately.After that,the adaptive fusion engine aims to prepare students’scores to be delivered to different feature selection algorithms,such as Recursive Feature Elimination and Boruta.Then,some machine learning algorithms such as Decision Tree,Random Forest,Adaboost,Lasso,Bagging,and K-Nearest Neighbor are employed to improve the suggested system’s efficiency.The experimental results in the grading engine showed that Extracting DIStributionally similar words using the CO-occurrences similarity measure achieved the best correlation values.Furthermore,in the adaptive fusion engine,the Random Forest algorithm outperforms all other machine learning algorithms using the(80%–20%)splitting method on the original dataset.It achieves 91.30%,94.20%,0.023,0.106,and 0.153 in terms of Pearson’s Correlation Coefficient,Willmot’s Index of Agreement,Mean Square Error,Mean Absolute Error,and Root Mean Square Error metrics,respectively.
文摘The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media regulatory authorities.In this paper,we focus on the automatic Arabic Ticker Recognition system for the Al-Ekhbariya news channel.The primary emphasis of this research is on ticker recognition methods and storage schemes.To that end,the research is aimed at character-wise explicit segmentation using a semantic segmentation technique and words identification method.The proposed learning architecture considers the grouping of homogeneousshaped classes.This incorporates linguistic taxonomy in a unified manner to address the imbalance in data distribution which leads to individual biases.Furthermore,experiments with a novel ArabicNews Ticker(Al-ENT)dataset that provides accurate character-level and character components-level labeling to evaluate the effectiveness of the suggested approach.The proposed method attains 96.5%,outperforming the current state-of-the-art technique by 8.5%.The study reveals that our strategy improves the performance of lowrepresentation correlated character classes.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number(168/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R263),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia+1 种基金The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4340237DSR32)The author would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work。
文摘The recognition of the Arabic characters is a crucial task incomputer vision and Natural Language Processing fields. Some major complicationsin recognizing handwritten texts include distortion and patternvariabilities. So, the feature extraction process is a significant task in NLPmodels. If the features are automatically selected, it might result in theunavailability of adequate data for accurately forecasting the character classes.But, many features usually create difficulties due to high dimensionality issues.Against this background, the current study develops a Sailfish Optimizer withDeep Transfer Learning-Enabled Arabic Handwriting Character Recognition(SFODTL-AHCR) model. The projected SFODTL-AHCR model primarilyfocuses on identifying the handwritten Arabic characters in the inputimage. The proposed SFODTL-AHCR model pre-processes the input imageby following the Histogram Equalization approach to attain this objective.The Inception with ResNet-v2 model examines the pre-processed image toproduce the feature vectors. The Deep Wavelet Neural Network (DWNN)model is utilized to recognize the handwritten Arabic characters. At last,the SFO algorithm is utilized for fine-tuning the parameters involved in theDWNNmodel to attain better performance. The performance of the proposedSFODTL-AHCR model was validated using a series of images. Extensivecomparative analyses were conducted. The proposed method achieved a maximum accuracy of 99.73%. The outcomes inferred the supremacy of theproposed SFODTL-AHCR model over other approaches.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R263),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabiathe Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR35.
文摘Text classification or categorization is the procedure of automatically tagging a textual document with most related labels or classes.When the number of labels is limited to one,the task becomes single-label text categorization.The Arabic texts include unstructured information also like English texts,and that is understandable for machine learning(ML)techniques,the text is changed and demonstrated by numerical value.In recent times,the dominant method for natural language processing(NLP)tasks is recurrent neural network(RNN),in general,long short termmemory(LSTM)and convolutional neural network(CNN).Deep learning(DL)models are currently presented for deriving a massive amount of text deep features to an optimum performance from distinct domains such as text detection,medical image analysis,and so on.This paper introduces aModified Dragonfly Optimization with Extreme Learning Machine for Text Representation and Recognition(MDFO-EMTRR)model onArabicCorpus.The presentedMDFO-EMTRR technique mainly concentrates on the recognition and classification of the Arabic text.To achieve this,theMDFO-EMTRRtechnique encompasses data pre-processing to transform the input data into compatible format.Next,the ELM model is utilized for the representation and recognition of the Arabic text.At last,the MDFO algorithm was exploited for optimal tuning of the parameters related to the ELM method and thereby accomplish enhanced classifier results.The experimental result analysis of the MDFO-EMTRR system was performed on benchmark datasets and attained maximum accuracy of 99.74%.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Small Groups Project under Grant Number(120/43)Princess Nourah Bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R263)+1 种基金Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura Universitysupporting this work by Grant Code:(22UQU4310373DSR36).
文摘Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model.