Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli...Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).展开更多
In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and effici...In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field,especially for the Arabic language,which,compared to other languages,has a dearth of published works.In this work,we presented an efficient and new system for offline Arabic handwritten text recognition.Our new approach is based on the combination of a Convolutional Neural Network(CNN)and a Bidirectional Long-Term Memory(BLSTM)followed by a Connectionist Temporal Classification layer(CTC).Moreover,during the training phase of the model,we introduce an algorithm of data augmentation to increase the quality of data.Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters,thus overcoming several problems related to this point.To train and test(evaluate)our approach,we used two Arabic handwritten text recognition databases,which are IFN/ENIT and KHATT.The Experimental results show that our new approach,compared to other methods in the literature,gives better results.展开更多
The digital text media is the most common media transferred via the internet for various purposes and is very sensitive to transfer online with the possibility to be tampered illegally by the tampering attacks.Therefo...The digital text media is the most common media transferred via the internet for various purposes and is very sensitive to transfer online with the possibility to be tampered illegally by the tampering attacks.Therefore,improving the security and authenticity of the text when it is transferred via the internet has become one of the most difcult challenges that researchers face today.Arabic text is more sensitive than other languages due to Harakat’s existence in Arabic diacritics such as Kasra,and Damma in which making basic changes such as modifying diacritic arrangements can lead to change the text meaning.In this paper,an intelligent hybrid solution is proposed with highly sensitive detection for any tampering on Arabic text exchanged via the internet.Natural language processing,entropy,and watermarking techniques have been integrated into this method to improve the security and reliability of Arabic text without limitations in text nature or size,and type or volumes of tampering attack.The proposed scheme is implemented,simulated,and validated using four standard Arabic datasets of varying lengths under multiple random locations of insertion,reorder,and deletion attacks.The experimental and simulation results prove the accuracy of tampering detection of the proposed scheme against all kinds of tampering attacks.Comparison results show that the proposed approach outperforms all of the other baseline approaches in terms of tampering detection accuracy.展开更多
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi...With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%.展开更多
The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media r...The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media regulatory authorities.In this paper,we focus on the automatic Arabic Ticker Recognition system for the Al-Ekhbariya news channel.The primary emphasis of this research is on ticker recognition methods and storage schemes.To that end,the research is aimed at character-wise explicit segmentation using a semantic segmentation technique and words identification method.The proposed learning architecture considers the grouping of homogeneousshaped classes.This incorporates linguistic taxonomy in a unified manner to address the imbalance in data distribution which leads to individual biases.Furthermore,experiments with a novel ArabicNews Ticker(Al-ENT)dataset that provides accurate character-level and character components-level labeling to evaluate the effectiveness of the suggested approach.The proposed method attains 96.5%,outperforming the current state-of-the-art technique by 8.5%.The study reveals that our strategy improves the performance of lowrepresentation correlated character classes.展开更多
Online reviews regarding purchasing services or products offered are the main source of users’opinions.To gain fame or profit,generally,spam reviews are written to demote or promote certain targeted products or servi...Online reviews regarding purchasing services or products offered are the main source of users’opinions.To gain fame or profit,generally,spam reviews are written to demote or promote certain targeted products or services.This practice is called review spamming.During the last few years,various techniques have been recommended to solve the problem of spam reviews.Previous spam detection study focuses on English reviews,with a lesser interest in other languages.Spam review detection in Arabic online sources is an innovative topic despite the vast amount of data produced.Thus,this study develops an Automated Spam Review Detection using optimal Stacked Gated Recurrent Unit(SRD-OSGRU)on Arabic Opinion Text.The presented SRD-OSGRU model mainly intends to classify Arabic reviews into two classes:spam and truthful.Initially,the presented SRD-OSGRU model follows different levels of data preprocessing to convert the actual review data into a compatible format.Next,unigram and bigram feature extractors are utilized.The SGRU model is employed in this study to identify and classify Arabic spam reviews.Since the trial-and-error adjustment of hyperparameters is a tedious process,a white shark optimizer(WSO)is utilized,boosting the detection efficiency of the SGRU model.The experimental validation of the SRD-OSGRU model is assessed under two datasets,namely DOSC dataset.An extensive comparison study pointed out the enhanced performance of the SRD-OSGRU model over other recent approaches.展开更多
Opinion Mining(OM)studies in Arabic are limited though it is one of the most extensively-spoken languages worldwide.Though the interest in OM studies in the Arabic language is growing among researchers,it needs a vast...Opinion Mining(OM)studies in Arabic are limited though it is one of the most extensively-spoken languages worldwide.Though the interest in OM studies in the Arabic language is growing among researchers,it needs a vast number of investigations due to the unique morphological principles of the language.Arabic OM studies experience multiple challenges owing to the poor existence of language sources and Arabic-specific linguistic features.The comparative OM studies in the English language are wide and novel.But,comparative OM studies in the Arabic language are yet to be established and are still in a nascent stage.The unique features of the Arabic language make it essential to expand the studies regarding the Arabic text.It contains unique featuressuchasdiacritics,elongation,inflectionandwordlength.Thecurrent study proposes a Political Optimizer with Probabilistic Neural Network-based Comparative Opinion Mining(POPNN-COM)model for the Arabic text.The proposed POPNN-COM model aims to recognize comparative and non-comparative texts in Arabic in the context of social media.Initially,the POPNN-COM model involves different levels of data pre-processing to transform the input data into a useful format.Then,the pre-processed data is fed into the PNN model for classification and recognition of the data under different class labels.At last,the PO algorithm is employed for fine-tuning the parameters involved in this model to achieve enhanced results.The proposed POPNN-COM model was experimentally validated using two standard datasets,and the outcomes established the promising performance of the proposed POPNN-COM method over other recent approaches.展开更多
The most sensitive Arabic text available online is the digital Holy Quran.This sacred Islamic religious book is recited by all Muslims worldwide including non-Arabs as part of their worship needs.Thus,it should be pro...The most sensitive Arabic text available online is the digital Holy Quran.This sacred Islamic religious book is recited by all Muslims worldwide including non-Arabs as part of their worship needs.Thus,it should be protected from any kind of tampering to keep its invaluable meaning intact.Different characteristics of Arabic letters like the vowels(),Kashida(extended letters),and other symbols in the Holy Quran must be secured from alterations.The cover text of the Quran and its watermarked text are different due to the low values of the Peak Signal to Noise Ratio(PSNR)and Embedding Ratio(ER).A watermarking technique with enhanced attributes must,therefore,be designed for the Quran’s text using Arabic vowels with kashida.The gap addressed by this paper is to improve the security of Arabic text in the Holy Quran by using vowels with kashida.The purpose of this paper is to enhance the Quran text watermarking scheme based on a reversing technique.The methodology consists of four phases:The first phase is a pre-processing followed by the second phase-the embedding process phase—which will hide the data after the vowels.That is,if the secret bit is“1”,then the kashida is inserted;however,the kashida is not inserted if the bit is“0”.The third phase is the extraction process and the last phase is to evaluate the performance of the proposed scheme by using PSNR(for the imperceptibility)and ER(for the capacity).The experimental results show that the proposed method of imperceptibility insertion is also optimized with the help of a reversing algorithm.The proposed strategy obtains a 90.5%capacity.Furthermore,the proposed algorithm attained 66.1%which is referred to as imperceptibility.展开更多
The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-ba...The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-based models for the Arabic language,as the researchers’community pays little attention to it.The Muslims Holy Qur’an book is written using Arabic diacritized text.In this paper,an end-to-end transformer model to building a robust Qur’an vs.recognition is proposed.The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework.A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model.AMel filter bank is used for feature extraction.To build a language model(LM),the Recurrent Neural Network(RNN)and Long short-term memory(LSTM)were used to train an n-gram word-based LM.As a part of this research,a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model,consisting of 10 h of.wav recitations performed by 60 reciters.The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate(CER)of 1.98%and a word error rate(WER)of 6.16%.We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.展开更多
文摘Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).
文摘In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field,especially for the Arabic language,which,compared to other languages,has a dearth of published works.In this work,we presented an efficient and new system for offline Arabic handwritten text recognition.Our new approach is based on the combination of a Convolutional Neural Network(CNN)and a Bidirectional Long-Term Memory(BLSTM)followed by a Connectionist Temporal Classification layer(CTC).Moreover,during the training phase of the model,we introduce an algorithm of data augmentation to increase the quality of data.Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters,thus overcoming several problems related to this point.To train and test(evaluate)our approach,we used two Arabic handwritten text recognition databases,which are IFN/ENIT and KHATT.The Experimental results show that our new approach,compared to other methods in the literature,gives better results.
基金The author extends his appreciation to the Deanship of Scientic Research at King Khalid University for funding this work under Grant Number(R.G.P.2/55/40/2019),Received by Fahd N.Al-Wesabi.www.kku.edu.sa。
文摘The digital text media is the most common media transferred via the internet for various purposes and is very sensitive to transfer online with the possibility to be tampered illegally by the tampering attacks.Therefore,improving the security and authenticity of the text when it is transferred via the internet has become one of the most difcult challenges that researchers face today.Arabic text is more sensitive than other languages due to Harakat’s existence in Arabic diacritics such as Kasra,and Damma in which making basic changes such as modifying diacritic arrangements can lead to change the text meaning.In this paper,an intelligent hybrid solution is proposed with highly sensitive detection for any tampering on Arabic text exchanged via the internet.Natural language processing,entropy,and watermarking techniques have been integrated into this method to improve the security and reliability of Arabic text without limitations in text nature or size,and type or volumes of tampering attack.The proposed scheme is implemented,simulated,and validated using four standard Arabic datasets of varying lengths under multiple random locations of insertion,reorder,and deletion attacks.The experimental and simulation results prove the accuracy of tampering detection of the proposed scheme against all kinds of tampering attacks.Comparison results show that the proposed approach outperforms all of the other baseline approaches in terms of tampering detection accuracy.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR40.
文摘With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%.
文摘The news ticker is a common feature of many different news networks that display headlines and other information.News ticker recognition applications are highly valuable in e-business and news surveillance for media regulatory authorities.In this paper,we focus on the automatic Arabic Ticker Recognition system for the Al-Ekhbariya news channel.The primary emphasis of this research is on ticker recognition methods and storage schemes.To that end,the research is aimed at character-wise explicit segmentation using a semantic segmentation technique and words identification method.The proposed learning architecture considers the grouping of homogeneousshaped classes.This incorporates linguistic taxonomy in a unified manner to address the imbalance in data distribution which leads to individual biases.Furthermore,experiments with a novel ArabicNews Ticker(Al-ENT)dataset that provides accurate character-level and character components-level labeling to evaluate the effectiveness of the suggested approach.The proposed method attains 96.5%,outperforming the current state-of-the-art technique by 8.5%.The study reveals that our strategy improves the performance of lowrepresentation correlated character classes.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R263)PrincessNourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4310373DSR58The authors are thankful to the Deanship of ScientificResearch atNajranUniversity for funding thiswork under theResearch Groups Funding program grant code(NU/RG/SERC/11/7).
文摘Online reviews regarding purchasing services or products offered are the main source of users’opinions.To gain fame or profit,generally,spam reviews are written to demote or promote certain targeted products or services.This practice is called review spamming.During the last few years,various techniques have been recommended to solve the problem of spam reviews.Previous spam detection study focuses on English reviews,with a lesser interest in other languages.Spam review detection in Arabic online sources is an innovative topic despite the vast amount of data produced.Thus,this study develops an Automated Spam Review Detection using optimal Stacked Gated Recurrent Unit(SRD-OSGRU)on Arabic Opinion Text.The presented SRD-OSGRU model mainly intends to classify Arabic reviews into two classes:spam and truthful.Initially,the presented SRD-OSGRU model follows different levels of data preprocessing to convert the actual review data into a compatible format.Next,unigram and bigram feature extractors are utilized.The SGRU model is employed in this study to identify and classify Arabic spam reviews.Since the trial-and-error adjustment of hyperparameters is a tedious process,a white shark optimizer(WSO)is utilized,boosting the detection efficiency of the SGRU model.The experimental validation of the SRD-OSGRU model is assessed under two datasets,namely DOSC dataset.An extensive comparison study pointed out the enhanced performance of the SRD-OSGRU model over other recent approaches.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi ArabiaThe authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4310373DSR56.
文摘Opinion Mining(OM)studies in Arabic are limited though it is one of the most extensively-spoken languages worldwide.Though the interest in OM studies in the Arabic language is growing among researchers,it needs a vast number of investigations due to the unique morphological principles of the language.Arabic OM studies experience multiple challenges owing to the poor existence of language sources and Arabic-specific linguistic features.The comparative OM studies in the English language are wide and novel.But,comparative OM studies in the Arabic language are yet to be established and are still in a nascent stage.The unique features of the Arabic language make it essential to expand the studies regarding the Arabic text.It contains unique featuressuchasdiacritics,elongation,inflectionandwordlength.Thecurrent study proposes a Political Optimizer with Probabilistic Neural Network-based Comparative Opinion Mining(POPNN-COM)model for the Arabic text.The proposed POPNN-COM model aims to recognize comparative and non-comparative texts in Arabic in the context of social media.Initially,the POPNN-COM model involves different levels of data pre-processing to transform the input data into a useful format.Then,the pre-processed data is fed into the PNN model for classification and recognition of the data under different class labels.At last,the PO algorithm is employed for fine-tuning the parameters involved in this model to achieve enhanced results.The proposed POPNN-COM model was experimentally validated using two standard datasets,and the outcomes established the promising performance of the proposed POPNN-COM method over other recent approaches.
基金This work is conducted at Razak Faculty of Technology and Informatics,under cyber physical systems research group and funded by MOHE(FRGS:R.K130000.7856.5F026),Received by Nilam Nur Amir Sjarif.
文摘The most sensitive Arabic text available online is the digital Holy Quran.This sacred Islamic religious book is recited by all Muslims worldwide including non-Arabs as part of their worship needs.Thus,it should be protected from any kind of tampering to keep its invaluable meaning intact.Different characteristics of Arabic letters like the vowels(),Kashida(extended letters),and other symbols in the Holy Quran must be secured from alterations.The cover text of the Quran and its watermarked text are different due to the low values of the Peak Signal to Noise Ratio(PSNR)and Embedding Ratio(ER).A watermarking technique with enhanced attributes must,therefore,be designed for the Quran’s text using Arabic vowels with kashida.The gap addressed by this paper is to improve the security of Arabic text in the Holy Quran by using vowels with kashida.The purpose of this paper is to enhance the Quran text watermarking scheme based on a reversing technique.The methodology consists of four phases:The first phase is a pre-processing followed by the second phase-the embedding process phase—which will hide the data after the vowels.That is,if the secret bit is“1”,then the kashida is inserted;however,the kashida is not inserted if the bit is“0”.The third phase is the extraction process and the last phase is to evaluate the performance of the proposed scheme by using PSNR(for the imperceptibility)and ER(for the capacity).The experimental results show that the proposed method of imperceptibility insertion is also optimized with the help of a reversing algorithm.The proposed strategy obtains a 90.5%capacity.Furthermore,the proposed algorithm attained 66.1%which is referred to as imperceptibility.
基金the Chair of Prince Faisal for Artificial Intelligent research(CPFIA),Qassim University through the Project Number QU-CPFAI-2-10-5.
文摘The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-based models for the Arabic language,as the researchers’community pays little attention to it.The Muslims Holy Qur’an book is written using Arabic diacritized text.In this paper,an end-to-end transformer model to building a robust Qur’an vs.recognition is proposed.The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework.A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model.AMel filter bank is used for feature extraction.To build a language model(LM),the Recurrent Neural Network(RNN)and Long short-term memory(LSTM)were used to train an n-gram word-based LM.As a part of this research,a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model,consisting of 10 h of.wav recitations performed by 60 reciters.The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate(CER)of 1.98%and a word error rate(WER)of 6.16%.We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.