Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentime...Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentiment analysisin widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grapplingwith resource-poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language,characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu,Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguisticfeatures, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis aformidable undertaking. The limited availability of resources has fueled increased interest among researchers,prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu languagesentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into fivelabels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments andemotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, theinitial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such asnewspapers, articles, and socialmedia comments. Subsequent to this data collection, a thorough process of cleaningand preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deeplearningmodels, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for bothtraining and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning tooptimize the models’ efficacy. Evaluation metrics such as precision, recall, and the F1-score are employed to assessthe effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis,gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN,solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.展开更多
This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text...This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.展开更多
Race classification is a long-standing challenge in the field of face image analysis.The investigation of salient facial features is an important task to avoid processing all face parts.Face segmentation strongly bene...Race classification is a long-standing challenge in the field of face image analysis.The investigation of salient facial features is an important task to avoid processing all face parts.Face segmentation strongly benefits several face analysis tasks,including ethnicity and race classification.We propose a race-classification algorithm using a prior face segmentation framework.A deep convolutional neural network(DCNN)was used to construct a face segmentation model.For training the DCNN,we label face images according to seven different classes,that is,nose,skin,hair,eyes,brows,back,and mouth.The DCNN model developed in the first phase was used to create segmentation results.The probabilistic classification method is used,and probability maps(PMs)are created for each semantic class.We investigated five salient facial features from among seven that help in race classification.Features are extracted from the PMs of five classes,and a new model is trained based on the DCNN.We assessed the performance of the proposed race classification method on four standard face datasets,reporting superior results compared with previous studies.展开更多
Depression is a mental psychological disorder that may cause a physical disorder or lead to death.It is highly impactful on the socialeconomical life of a person;therefore,its effective and timely detection is needful...Depression is a mental psychological disorder that may cause a physical disorder or lead to death.It is highly impactful on the socialeconomical life of a person;therefore,its effective and timely detection is needful.Despite speech and gait,facial expressions have valuable clues to depression.This study proposes a depression detection system based on facial expression analysis.Facial features have been used for depression detection using Support Vector Machine(SVM)and Convolutional Neural Network(CNN).We extracted micro-expressions using Facial Action Coding System(FACS)as Action Units(AUs)correlated with the sad,disgust,and contempt features for depression detection.A CNN-based model is also proposed in this study to auto classify depressed subjects from images or videos in real-time.Experiments have been performed on the dataset obtained from Bahawal Victoria Hospital,Bahawalpur,Pakistan,as per the patient health questionnaire depression scale(PHQ-8);for inferring the mental condition of a patient.The experiments revealed 99.9%validation accuracy on the proposed CNN model,while extracted features obtained 100%accuracy on SVM.Moreover,the results proved the superiority of the reported approach over state-of-the-art methods.展开更多
Time series forecasting plays a significant role in numerous applications,including but not limited to,industrial planning,water consumption,medical domains,exchange rates and consumer price index.The main problem is ...Time series forecasting plays a significant role in numerous applications,including but not limited to,industrial planning,water consumption,medical domains,exchange rates and consumer price index.The main problem is insufficient forecasting accuracy.The present study proposes a hybrid forecastingmethods to address this need.The proposed method includes three models.The first model is based on the autoregressive integrated moving average(ARIMA)statistical model;the second model is a back propagation neural network(BPNN)with adaptive slope and momentum parameters;and the thirdmodel is a hybridization between ARIMA and BPNN(ARIMA/BPNN)and artificial neural networks and ARIMA(ARIMA/ANN)to gain the benefits of linear and nonlinearmodeling.The forecasting models proposed in this study are used to predict the indices of the consumer price index(CPI),and predict the expected number of cancer patients in the Ibb Province in Yemen.Statistical standard measures used to evaluate the proposed method include(i)mean square error,(ii)mean absolute error,(iii)root mean square error,and(iv)mean absolute percentage error.Based on the computational results,the improvement rate of forecasting the CPI dataset was 5%,71%,and 4%for ARIMA/BPNN model,ARIMA/ANN model,and BPNN model respectively;while the result for cancer patients’dataset was 7%,200%,and 19%for ARIMA/BPNNmodel,ARIMA/ANN model,and BPNNmodel respectively.Therefore,it is obvious that the proposed method reduced the randomness degree,and the alterations affected the time series with data non-linearity.The ARIMA/ANN model outperformed each of its components when it was applied separately in terms of increasing the accuracy of forecasting and decreasing the overall errors of forecasting.展开更多
A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an e...A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.展开更多
The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video ana...The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.展开更多
In the machine learning(ML)paradigm,data augmentation serves as a regularization approach for creating ML models.The increase in the diversification of training samples increases the generalization capabilities,which ...In the machine learning(ML)paradigm,data augmentation serves as a regularization approach for creating ML models.The increase in the diversification of training samples increases the generalization capabilities,which enhances the prediction performance of classifiers when tested on unseen examples.Deep learning(DL)models have a lot of parameters,and they frequently overfit.Effectively,to avoid overfitting,data plays a major role to augment the latest improvements in DL.Nevertheless,reliable data collection is a major limiting factor.Frequently,this problem is undertaken by combining augmentation of data,transfer learning,dropout,and methods of normalization in batches.In this paper,we introduce the application of data augmentation in the field of image classification using Random Multi-model Deep Learning(RMDL)which uses the association approaches of multi-DL to yield random models for classification.We present a methodology for using Generative Adversarial Networks(GANs)to generate images for data augmenting.Through experiments,we discover that samples generated by GANs when fed into RMDL improve both accuracy and model efficiency.Experimenting across both MNIST and CIAFAR-10 datasets show that,error rate with proposed approach has been decreased with different random models.展开更多
Bloom filter(BF)is a space-and-time efficient probabilistic technique that helps answermembership queries.However,BF faces several issues.The problems with traditional BF are generally two.Firstly,a large number of fa...Bloom filter(BF)is a space-and-time efficient probabilistic technique that helps answermembership queries.However,BF faces several issues.The problems with traditional BF are generally two.Firstly,a large number of false positives can return wrong content when the data is queried.Secondly,the large size of BF is a bottleneck in the speed of querying and thus uses large memory.In order to solve the above two issues,in this article,we propose the check bits concept.From the implementation perspective,in the check bits approach,before saving the content value in the BF,we obtain the binary representation of the content value.Then,we take some bits of the content value,we call these the check bits.These bits are stored in a separate array such that they point to the same location as the BF.Finally,the content value(data)is stored in the BF based on the hash function values.Before retrieval of data from BF,the reverse process of the steps ensures that even if the same hash functions output has been generated for the content,the check bits make sure that the retrieval does not depend on the hash output alone.This thus helps in the reduction of false positives.In the experimental evaluation,we are able to reduce more than 50%of false positives.In our proposed approach,the false positives can still occur,however,false positives can only occur if the hash functions and check bits generate the same value for a particular content.The chances of such scenarios are less,therefore,we get a reduction of approximately more than 50%false positives in all cases.We believe that the proposed approach adds to the state of the art and opens new directions as such.展开更多
Coronavirus disease,which resulted from the SARS-CoV-2 virus,has spread worldwide since early 2020 and has been declared a pandemic by the World Health Organization(WHO).Coronavirus disease is also termed COVID-19.It ...Coronavirus disease,which resulted from the SARS-CoV-2 virus,has spread worldwide since early 2020 and has been declared a pandemic by the World Health Organization(WHO).Coronavirus disease is also termed COVID-19.It affects the human respiratory system and thus can be traced and tracked from the Chest X-Ray images.Therefore,Chest X-Ray alone may play a vital role in identifying COVID-19 cases.In this paper,we propose a Machine Learning(ML)approach that utilizes the X-Ray images to classify the healthy and affected patients based on the patterns found in these images.The article also explores traditional,and Deep Learning(DL)approaches for COVID-19 patterns from Chest X-Ray images to predict,analyze,and further understand this virus.The experimental evaluation of the proposed approach achieves 97.5% detection performance using the DL model for COVID-19 versus normal cases.In contrast,for COVID-19 versus Pneumonia Virus scenario,we achieve 94.5% accurate detections.Our extensive evaluation in the experimental section guides and helps in the selection of an appropriate model for similar tasks.Thus,the approach can be used for medical usages and is particularly pertinent in detecting COVID-19 positive patients using X-Ray images alone.展开更多
There are over 200 different varieties of dates fruit in the world.Interestingly,every single type has some very specific features that differ from the others.In recent years,sorting,separating,and arranging in automa...There are over 200 different varieties of dates fruit in the world.Interestingly,every single type has some very specific features that differ from the others.In recent years,sorting,separating,and arranging in automated industries,in fruits businesses,and more specifically in dates businesses have inspired many research dimensions.In this regard,this paper focuses on the detection and recognition of dates using computer vision and machine learning.Our experimental setup is based on the classical machine learning approach and the deep learning approach for nine classes of dates fruit.Classical machine learning includes the Bayesian network,Support Vector Machine,Random Forest,and Multi-Layer Perceptron(MLP),while the Convolutional Neural Network is used for the deep learning set.The feature set includes Color Layout features,Fuzzy Color and Texture Histogram,Gabor filtering,and the Pyramid Histogram of the Oriented Gradients.The fusion of various features is also extensively explored in this paper.The MLP achieves the highest detection performance with an F-measure of 0.938.Moreover,deep learning shows better accuracy than the classical machine learning algorithms.In fact,deep learning got 2%more accurate results as compared to the MLP and the Random forest.We also show that classical machine learning could give increased classification performance which could get close to that provided by deep learning through the use of optimized tuning and a good feature set.展开更多
Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and ...Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.展开更多
文摘Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentiment analysisin widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grapplingwith resource-poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language,characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu,Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguisticfeatures, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis aformidable undertaking. The limited availability of resources has fueled increased interest among researchers,prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu languagesentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into fivelabels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments andemotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, theinitial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such asnewspapers, articles, and socialmedia comments. Subsequent to this data collection, a thorough process of cleaningand preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deeplearningmodels, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for bothtraining and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning tooptimize the models’ efficacy. Evaluation metrics such as precision, recall, and the F1-score are employed to assessthe effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis,gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN,solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.
文摘This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.
基金This work was partially supported by a National Research Foundation of Korea(NRF)grant(No.2019R1F1A1062237)under the ITRC(Information Technology Research Center)support program(IITP-2021-2018-0-01431)supervised by the IITP(Institute for Information and Communications Technology Planning and Evaluation)funded by the Ministry of Science and ICT(MSIT),Korea.
文摘Race classification is a long-standing challenge in the field of face image analysis.The investigation of salient facial features is an important task to avoid processing all face parts.Face segmentation strongly benefits several face analysis tasks,including ethnicity and race classification.We propose a race-classification algorithm using a prior face segmentation framework.A deep convolutional neural network(DCNN)was used to construct a face segmentation model.For training the DCNN,we label face images according to seven different classes,that is,nose,skin,hair,eyes,brows,back,and mouth.The DCNN model developed in the first phase was used to create segmentation results.The probabilistic classification method is used,and probability maps(PMs)are created for each semantic class.We investigated five salient facial features from among seven that help in race classification.Features are extracted from the PMs of five classes,and a new model is trained based on the DCNN.We assessed the performance of the proposed race classification method on four standard face datasets,reporting superior results compared with previous studies.
文摘Depression is a mental psychological disorder that may cause a physical disorder or lead to death.It is highly impactful on the socialeconomical life of a person;therefore,its effective and timely detection is needful.Despite speech and gait,facial expressions have valuable clues to depression.This study proposes a depression detection system based on facial expression analysis.Facial features have been used for depression detection using Support Vector Machine(SVM)and Convolutional Neural Network(CNN).We extracted micro-expressions using Facial Action Coding System(FACS)as Action Units(AUs)correlated with the sad,disgust,and contempt features for depression detection.A CNN-based model is also proposed in this study to auto classify depressed subjects from images or videos in real-time.Experiments have been performed on the dataset obtained from Bahawal Victoria Hospital,Bahawalpur,Pakistan,as per the patient health questionnaire depression scale(PHQ-8);for inferring the mental condition of a patient.The experiments revealed 99.9%validation accuracy on the proposed CNN model,while extracted features obtained 100%accuracy on SVM.Moreover,the results proved the superiority of the reported approach over state-of-the-art methods.
基金Researchers would like to thank the Deanship of Scientific Research,Qassim University for funding the publication of this project.
文摘Time series forecasting plays a significant role in numerous applications,including but not limited to,industrial planning,water consumption,medical domains,exchange rates and consumer price index.The main problem is insufficient forecasting accuracy.The present study proposes a hybrid forecastingmethods to address this need.The proposed method includes three models.The first model is based on the autoregressive integrated moving average(ARIMA)statistical model;the second model is a back propagation neural network(BPNN)with adaptive slope and momentum parameters;and the thirdmodel is a hybridization between ARIMA and BPNN(ARIMA/BPNN)and artificial neural networks and ARIMA(ARIMA/ANN)to gain the benefits of linear and nonlinearmodeling.The forecasting models proposed in this study are used to predict the indices of the consumer price index(CPI),and predict the expected number of cancer patients in the Ibb Province in Yemen.Statistical standard measures used to evaluate the proposed method include(i)mean square error,(ii)mean absolute error,(iii)root mean square error,and(iv)mean absolute percentage error.Based on the computational results,the improvement rate of forecasting the CPI dataset was 5%,71%,and 4%for ARIMA/BPNN model,ARIMA/ANN model,and BPNN model respectively;while the result for cancer patients’dataset was 7%,200%,and 19%for ARIMA/BPNNmodel,ARIMA/ANN model,and BPNNmodel respectively.Therefore,it is obvious that the proposed method reduced the randomness degree,and the alterations affected the time series with data non-linearity.The ARIMA/ANN model outperformed each of its components when it was applied separately in terms of increasing the accuracy of forecasting and decreasing the overall errors of forecasting.
基金Researchers would like to thank the Deanship of Scientific Research,Qassim University,for funding publication of this project.
文摘A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.
基金The authors extend their appreciation to the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number QURDO001Project title:Intelligent Real-Time Crowd Monitoring System Using Unmanned Aerial Vehicle(UAV)Video and Global Positioning Systems(GPS)Data。
文摘The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.
基金The researchers would like to thank the Deanship of Scientific Research,Qassim University for funding the publication of this project.
文摘In the machine learning(ML)paradigm,data augmentation serves as a regularization approach for creating ML models.The increase in the diversification of training samples increases the generalization capabilities,which enhances the prediction performance of classifiers when tested on unseen examples.Deep learning(DL)models have a lot of parameters,and they frequently overfit.Effectively,to avoid overfitting,data plays a major role to augment the latest improvements in DL.Nevertheless,reliable data collection is a major limiting factor.Frequently,this problem is undertaken by combining augmentation of data,transfer learning,dropout,and methods of normalization in batches.In this paper,we introduce the application of data augmentation in the field of image classification using Random Multi-model Deep Learning(RMDL)which uses the association approaches of multi-DL to yield random models for classification.We present a methodology for using Generative Adversarial Networks(GANs)to generate images for data augmenting.Through experiments,we discover that samples generated by GANs when fed into RMDL improve both accuracy and model efficiency.Experimenting across both MNIST and CIAFAR-10 datasets show that,error rate with proposed approach has been decreased with different random models.
基金The authors would like to thank the chair of Prince Faisal binMishaal Al Saud for Artificial Intelligent research for funding this research work through the project number QU-CPFAI-2-7-4Also would like to extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education,and the Deanship of Scientific Research,Qassim University,for their support of this research.
文摘Bloom filter(BF)is a space-and-time efficient probabilistic technique that helps answermembership queries.However,BF faces several issues.The problems with traditional BF are generally two.Firstly,a large number of false positives can return wrong content when the data is queried.Secondly,the large size of BF is a bottleneck in the speed of querying and thus uses large memory.In order to solve the above two issues,in this article,we propose the check bits concept.From the implementation perspective,in the check bits approach,before saving the content value in the BF,we obtain the binary representation of the content value.Then,we take some bits of the content value,we call these the check bits.These bits are stored in a separate array such that they point to the same location as the BF.Finally,the content value(data)is stored in the BF based on the hash function values.Before retrieval of data from BF,the reverse process of the steps ensures that even if the same hash functions output has been generated for the content,the check bits make sure that the retrieval does not depend on the hash output alone.This thus helps in the reduction of false positives.In the experimental evaluation,we are able to reduce more than 50%of false positives.In our proposed approach,the false positives can still occur,however,false positives can only occur if the hash functions and check bits generate the same value for a particular content.The chances of such scenarios are less,therefore,we get a reduction of approximately more than 50%false positives in all cases.We believe that the proposed approach adds to the state of the art and opens new directions as such.
基金the financial support of this research under the number(coc-2020-1-1-L-9988)during the academic year 1441 AH/2020 AD.
文摘Coronavirus disease,which resulted from the SARS-CoV-2 virus,has spread worldwide since early 2020 and has been declared a pandemic by the World Health Organization(WHO).Coronavirus disease is also termed COVID-19.It affects the human respiratory system and thus can be traced and tracked from the Chest X-Ray images.Therefore,Chest X-Ray alone may play a vital role in identifying COVID-19 cases.In this paper,we propose a Machine Learning(ML)approach that utilizes the X-Ray images to classify the healthy and affected patients based on the patterns found in these images.The article also explores traditional,and Deep Learning(DL)approaches for COVID-19 patterns from Chest X-Ray images to predict,analyze,and further understand this virus.The experimental evaluation of the proposed approach achieves 97.5% detection performance using the DL model for COVID-19 versus normal cases.In contrast,for COVID-19 versus Pneumonia Virus scenario,we achieve 94.5% accurate detections.Our extensive evaluation in the experimental section guides and helps in the selection of an appropriate model for similar tasks.Thus,the approach can be used for medical usages and is particularly pertinent in detecting COVID-19 positive patients using X-Ray images alone.
文摘There are over 200 different varieties of dates fruit in the world.Interestingly,every single type has some very specific features that differ from the others.In recent years,sorting,separating,and arranging in automated industries,in fruits businesses,and more specifically in dates businesses have inspired many research dimensions.In this regard,this paper focuses on the detection and recognition of dates using computer vision and machine learning.Our experimental setup is based on the classical machine learning approach and the deep learning approach for nine classes of dates fruit.Classical machine learning includes the Bayesian network,Support Vector Machine,Random Forest,and Multi-Layer Perceptron(MLP),while the Convolutional Neural Network is used for the deep learning set.The feature set includes Color Layout features,Fuzzy Color and Texture Histogram,Gabor filtering,and the Pyramid Histogram of the Oriented Gradients.The fusion of various features is also extensively explored in this paper.The MLP achieves the highest detection performance with an F-measure of 0.938.Moreover,deep learning shows better accuracy than the classical machine learning algorithms.In fact,deep learning got 2%more accurate results as compared to the MLP and the Random forest.We also show that classical machine learning could give increased classification performance which could get close to that provided by deep learning through the use of optimized tuning and a good feature set.
基金This work was supported by the Deanship of Scientific Research at Qassim University.
文摘Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.