In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data...In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data may contain some sensitive information,it is also of great significance to study privacy-preserving machine learning algorithms.This paper focuses on the performance of the differentially private stochastic gradient descent(SGD)algorithm based on random features.To begin,the algorithm maps the original data into a lowdimensional space,thereby avoiding the traditional kernel method for large-scale data storage requirement.Subsequently,the algorithm iteratively optimizes parameters using the stochastic gradient descent approach.Lastly,the output perturbation mechanism is employed to introduce random noise,ensuring algorithmic privacy.We prove that the proposed algorithm satisfies the differential privacy while achieving fast convergence rates under some mild conditions.展开更多
Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment...Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment and support for accurate PD detection.Machine learning(ML)models have recently helped to solve problems in the classification of chronic diseases.This work aims to analyze the effect of selecting features on ML efficiency on a voice-based PD detection system.It includes PD classification models of Random forest,decision Tree,neural network,logistic regression and support vector machine.The feature selection is made by RF mean-decrease in accuracy and mean-decrease in Gini techniques.Random forest kerb feature selection(RFKFS)selects only 17 features from 754 attributes.The proposed technique uses validation metrics to assess the performance of ML models.The results of the RF model with feature selection performed well among all other models with high accuracy score of 96.56%and a precision of 88.02%,a sensitivity of 98.26%,a specificity of 96.06%.The respective validation score has an Non polynomial vector(NPV)of 99.47%,a Geometric Mean(GM)of 97.15%,a Youden’s index(YI)of 94.32%,and a Matthews’s correlation method(MCC)90.84%.The proposed model is also more robust than other models.It was also realised that using the RFKFS approach in the PD results in an effective and high-performing medical classifier.展开更多
Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a ...Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a frequently occurring mental illness that occurs in about 60%–80%of cases of dementia.It is usually observed between people in the age group of 60 years and above.Depending upon the severity of symptoms the patients can be categorized in Cognitive Normal(CN),Mild Cognitive Impairment(MCI)and Alzheimer’s Disease(AD).Alzheimer’s disease is the last phase of the disease where the brain is severely damaged,and the patients are not able to live on their own.Radiomics is an approach to extracting a huge number of features from medical images with the help of data characterization algorithms.Here,105 number of radiomic features are extracted and used to predict the alzhimer’s.This paper uses Support Vector Machine,K-Nearest Neighbour,Gaussian Naïve Bayes,eXtreme Gradient Boosting(XGBoost)and Random Forest to predict Alzheimer’s disease.The proposed random forest-based approach with the Radiomic features achieved an accuracy of 85%.This proposed approach also achieved 88%accuracy,88%recall,88%precision and 87%F1-score for AD vs.CN,it achieved 72%accuracy,73%recall,72%precisionand 71%F1-score for AD vs.MCI and it achieved 69%accuracy,69%recall,68%precision and 69%F1-score for MCI vs.CN.The comparative analysis shows that the proposed approach performs better than others approaches.展开更多
As the incidence of this disease has increased significantly in the recent years, expert systems and machine learning techniques to this problem have also taken a great attention from many scholars. This study aims at...As the incidence of this disease has increased significantly in the recent years, expert systems and machine learning techniques to this problem have also taken a great attention from many scholars. This study aims at diagnosing and prognosticating breast cancer with a machine learning method based on random forest classifier and feature selection technique. By weighting, keeping useful features and removing redundant features in datasets, the method was obtained to solve diagnosis problems via classifying Wisconsin Breast Cancer Diagnosis Dataset and to solve prognosis problem via classifying Wisconsin Breast Cancer Prognostic Dataset. On these datasets we obtained classification accuracy of 100% in the best case and of around 99.8% on average. This is very promising compared to the previously reported results. This result is for Wisconsin Breast Cancer Dataset but it states that this method can be used confidently for other breast cancer diagnosis problems, too.展开更多
Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information mor...Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.展开更多
Image matching refers to the process of matching two or more images obtained at different time,different sensors or different conditions through a large number of feature points in the image.At present,image matching ...Image matching refers to the process of matching two or more images obtained at different time,different sensors or different conditions through a large number of feature points in the image.At present,image matching is widely used in target recognition and tracking,indoor positioning and navigation.Local features missing,however,often occurs in color images taken in dark light,making the extracted feature points greatly reduced in number,so as to affect image matching and even fail the target recognition.An unsharp masking(USM)based denoising model is established and a local adaptive enhancement algorithm is proposed to achieve feature point compensation by strengthening local features of the dark image in order to increase amount of image information effectively.Fast library for approximate nearest neighbors(FLANN)and random sample consensus(RANSAC)are image matching algorithms.Experimental results show that the number of effective feature points obtained by the proposed algorithm from images in dark light environment is increased,and the accuracy of image matching can be improved obviously.展开更多
An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the...An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the organization. It deals with large amount of data, which contains various ir-relevant and redundant features and results in increased processing time and low detection rate. Therefore, feature selection should be treated as an indispensable pre-processing step to improve the overall system performance significantly while mining on huge datasets. In this context, in this paper, we focus on a two-step approach of feature selection based on Random Forest. The first step selects the features with higher variable importance score and guides the initialization of search process for the second step whose outputs the final feature subset for classification and in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset, so the classifiers and feature selection method will not be biased towards more frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The experimental results show that the Random Forest based proposed approach can select most im-portant and relevant features useful for classification, which, in turn, reduces not only the number of input features and time but also increases the classification accuracy.展开更多
Big data are regarded as a tremendous technology for processing a huge variety of data in a short time and with a large storage capacity.The user’s access over the internet creates massive data processing over the in...Big data are regarded as a tremendous technology for processing a huge variety of data in a short time and with a large storage capacity.The user’s access over the internet creates massive data processing over the internet.Big data require an intelligent feature selection model by addressing huge varieties of data.Traditional feature selection techniques are only applicable to simple data mining.Intelligent techniques are needed in big data processing and machine learning for an efficient classification.Major feature selection algorithms read the input features as they are.Then,the features are preprocessed and classified.Here,an algorithm does not consider the relatedness.During feature selection,all features are misread as outputs.Accordingly,a less optimal solution is achieved.In our proposed research,we focus on the feature selection by using supervised learning techniques called grey wolf optimization(GWO)with decomposed random differential grouping(DrnDG-GWO).First,decomposition of features into subsets based on relatedness in variables is performed.Random differential grouping is performed using a fitness value of two variables.Now,every subset is regarded as a population in GWO techniques.The combination of supervised machine learning with swarm intelligence techniques produces best feature optimization results in this research.Once the features are optimized,we classify using advanced kNN process for accurate data classification.The result of DrnDGGWO is compared with those of the standard GWO and GWO with PSO for feature selection to compare the efficiency of the proposed algorithm.The accuracy and time complexity of the proposed algorithm are 98%and 5 s,which are better than the existing techniques.展开更多
Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on...The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on the predictive classification accuracy of algorithms used for discriminating the different plant leaf images. The process involves extracting the important texture features from the digital images and then subjecting them to feature selection and further classification process. The leaf image features have been extracted by using Gabor texture features and these Gabor features are subjected to Random Forest feature selection algorithm for extracting important texture features. The four classification algorithms like K-Nearest Neighbour, J48, Classification and Regression Trees and Random Forest have been used for classification purpose. This study shows that there is a net improvement in the predictive classification accuracy values, when classification algorithms have been applied on selected features over the complete set of features.展开更多
Clinical Study and automatic diagnosis of electrocardiogram(ECG)data always remain a challenge in diagnosing cardiovascular activities.The analysis of ECG data relies on various factors like morphological features,cla...Clinical Study and automatic diagnosis of electrocardiogram(ECG)data always remain a challenge in diagnosing cardiovascular activities.The analysis of ECG data relies on various factors like morphological features,classification techniques,methods or models used to diagnose and its performance improvement.Another crucial factor in themethodology is howto train the model for each patient.Existing approaches use standard training model which faces challenges when training data has variation due to individual patient characteristics resulting in a lower detection accuracy.This paper proposes an adaptive approach to identify performance improvement in building a training model that analyze global trainingmethodology against an individual training methodology and identifying a gap between them.We provide our investigation and comparative study on these methods and model with standard classification techniques with basic morphological features and Heart RateVariability(HRV)thatmay aid real time application.This approach helps in analyzing and evaluating the performance of different techniques and can suggests adoption of a best model identification with efficient technique and efficient attribute set for real-time systems.展开更多
In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through cr...In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.展开更多
Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in te...Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency.Potentially,the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods.Unlike the broader machine-learning research,which frequently targets tasks within the low-precision regime,our study focuses on the high-precision regime crucial for solving PDEs.In this work,we study this problem from the following aspects:(i)we analyze the coeffcient matrix that arises in the RFM by studying the distribution of singular values;(ii)we investigate whether the continuous training causes the overfitting issue;(ii)we test direct and iterative methods as well as randomized methods for solving the optimization problem.Based on these results,we find that direct methods are superior to other methods if memory is not an issue,while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.展开更多
基金supported by Zhejiang Provincial Natural Science Foundation of China(LR20A010001)National Natural Science Foundation of China(12271473 and U21A20426)。
文摘In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data may contain some sensitive information,it is also of great significance to study privacy-preserving machine learning algorithms.This paper focuses on the performance of the differentially private stochastic gradient descent(SGD)algorithm based on random features.To begin,the algorithm maps the original data into a lowdimensional space,thereby avoiding the traditional kernel method for large-scale data storage requirement.Subsequently,the algorithm iteratively optimizes parameters using the stochastic gradient descent approach.Lastly,the output perturbation mechanism is employed to introduce random noise,ensuring algorithmic privacy.We prove that the proposed algorithm satisfies the differential privacy while achieving fast convergence rates under some mild conditions.
文摘Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment and support for accurate PD detection.Machine learning(ML)models have recently helped to solve problems in the classification of chronic diseases.This work aims to analyze the effect of selecting features on ML efficiency on a voice-based PD detection system.It includes PD classification models of Random forest,decision Tree,neural network,logistic regression and support vector machine.The feature selection is made by RF mean-decrease in accuracy and mean-decrease in Gini techniques.Random forest kerb feature selection(RFKFS)selects only 17 features from 754 attributes.The proposed technique uses validation metrics to assess the performance of ML models.The results of the RF model with feature selection performed well among all other models with high accuracy score of 96.56%and a precision of 88.02%,a sensitivity of 98.26%,a specificity of 96.06%.The respective validation score has an Non polynomial vector(NPV)of 99.47%,a Geometric Mean(GM)of 97.15%,a Youden’s index(YI)of 94.32%,and a Matthews’s correlation method(MCC)90.84%.The proposed model is also more robust than other models.It was also realised that using the RFKFS approach in the PD results in an effective and high-performing medical classifier.
文摘Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a frequently occurring mental illness that occurs in about 60%–80%of cases of dementia.It is usually observed between people in the age group of 60 years and above.Depending upon the severity of symptoms the patients can be categorized in Cognitive Normal(CN),Mild Cognitive Impairment(MCI)and Alzheimer’s Disease(AD).Alzheimer’s disease is the last phase of the disease where the brain is severely damaged,and the patients are not able to live on their own.Radiomics is an approach to extracting a huge number of features from medical images with the help of data characterization algorithms.Here,105 number of radiomic features are extracted and used to predict the alzhimer’s.This paper uses Support Vector Machine,K-Nearest Neighbour,Gaussian Naïve Bayes,eXtreme Gradient Boosting(XGBoost)and Random Forest to predict Alzheimer’s disease.The proposed random forest-based approach with the Radiomic features achieved an accuracy of 85%.This proposed approach also achieved 88%accuracy,88%recall,88%precision and 87%F1-score for AD vs.CN,it achieved 72%accuracy,73%recall,72%precisionand 71%F1-score for AD vs.MCI and it achieved 69%accuracy,69%recall,68%precision and 69%F1-score for MCI vs.CN.The comparative analysis shows that the proposed approach performs better than others approaches.
文摘As the incidence of this disease has increased significantly in the recent years, expert systems and machine learning techniques to this problem have also taken a great attention from many scholars. This study aims at diagnosing and prognosticating breast cancer with a machine learning method based on random forest classifier and feature selection technique. By weighting, keeping useful features and removing redundant features in datasets, the method was obtained to solve diagnosis problems via classifying Wisconsin Breast Cancer Diagnosis Dataset and to solve prognosis problem via classifying Wisconsin Breast Cancer Prognostic Dataset. On these datasets we obtained classification accuracy of 100% in the best case and of around 99.8% on average. This is very promising compared to the previously reported results. This result is for Wisconsin Breast Cancer Dataset but it states that this method can be used confidently for other breast cancer diagnosis problems, too.
基金This work is supported by the National Natural Science Foundation of China(No.61772561)the Key Research&Development Plan of Hunan Province(No.2018NK2012)+1 种基金the Science Research Projects of Hunan Provincial Education Department(Nos.18A174,18C0262)the Science&Technology Innovation Platform and Talent Plan of Hunan Province(2017TP1022).
文摘Hashing technology has the advantages of reducing data storage and improving the efficiency of the learning system,making it more and more widely used in image retrieval.Multi-view data describes image information more comprehensively than traditional methods using a single-view.How to use hashing to combine multi-view data for image retrieval is still a challenge.In this paper,a multi-view fusion hashing method based on RKCCA(Random Kernel Canonical Correlation Analysis)is proposed.In order to describe image content more accurately,we use deep learning dense convolutional network feature DenseNet to construct multi-view by combining GIST feature or BoW_SIFT(Bag-of-Words model+SIFT feature)feature.This algorithm uses RKCCA method to fuse multi-view features to construct association features and apply them to image retrieval.The algorithm generates binary hash code with minimal distortion error by designing quantization regularization terms.A large number of experiments on benchmark datasets show that this method is superior to other multi-view hashing methods.
基金Supported by the National Natural Science Foundation of China(No.61771186)the Heilongjiang Provincial Natural Science Foundation of China(No.YQ2020F012)the University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province(No.UNPYSCT-2017125).
文摘Image matching refers to the process of matching two or more images obtained at different time,different sensors or different conditions through a large number of feature points in the image.At present,image matching is widely used in target recognition and tracking,indoor positioning and navigation.Local features missing,however,often occurs in color images taken in dark light,making the extracted feature points greatly reduced in number,so as to affect image matching and even fail the target recognition.An unsharp masking(USM)based denoising model is established and a local adaptive enhancement algorithm is proposed to achieve feature point compensation by strengthening local features of the dark image in order to increase amount of image information effectively.Fast library for approximate nearest neighbors(FLANN)and random sample consensus(RANSAC)are image matching algorithms.Experimental results show that the number of effective feature points obtained by the proposed algorithm from images in dark light environment is increased,and the accuracy of image matching can be improved obviously.
文摘An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the organization. It deals with large amount of data, which contains various ir-relevant and redundant features and results in increased processing time and low detection rate. Therefore, feature selection should be treated as an indispensable pre-processing step to improve the overall system performance significantly while mining on huge datasets. In this context, in this paper, we focus on a two-step approach of feature selection based on Random Forest. The first step selects the features with higher variable importance score and guides the initialization of search process for the second step whose outputs the final feature subset for classification and in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset, so the classifiers and feature selection method will not be biased towards more frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The experimental results show that the Random Forest based proposed approach can select most im-portant and relevant features useful for classification, which, in turn, reduces not only the number of input features and time but also increases the classification accuracy.
文摘Big data are regarded as a tremendous technology for processing a huge variety of data in a short time and with a large storage capacity.The user’s access over the internet creates massive data processing over the internet.Big data require an intelligent feature selection model by addressing huge varieties of data.Traditional feature selection techniques are only applicable to simple data mining.Intelligent techniques are needed in big data processing and machine learning for an efficient classification.Major feature selection algorithms read the input features as they are.Then,the features are preprocessed and classified.Here,an algorithm does not consider the relatedness.During feature selection,all features are misread as outputs.Accordingly,a less optimal solution is achieved.In our proposed research,we focus on the feature selection by using supervised learning techniques called grey wolf optimization(GWO)with decomposed random differential grouping(DrnDG-GWO).First,decomposition of features into subsets based on relatedness in variables is performed.Random differential grouping is performed using a fitness value of two variables.Now,every subset is regarded as a population in GWO techniques.The combination of supervised machine learning with swarm intelligence techniques produces best feature optimization results in this research.Once the features are optimized,we classify using advanced kNN process for accurate data classification.The result of DrnDGGWO is compared with those of the standard GWO and GWO with PSO for feature selection to compare the efficiency of the proposed algorithm.The accuracy and time complexity of the proposed algorithm are 98%and 5 s,which are better than the existing techniques.
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
文摘The digital images have been studied for image classification, enhancement, image compression and image segmentation purposes. In the present work, it is proposed to study the effects of feature selection algorithm on the predictive classification accuracy of algorithms used for discriminating the different plant leaf images. The process involves extracting the important texture features from the digital images and then subjecting them to feature selection and further classification process. The leaf image features have been extracted by using Gabor texture features and these Gabor features are subjected to Random Forest feature selection algorithm for extracting important texture features. The four classification algorithms like K-Nearest Neighbour, J48, Classification and Regression Trees and Random Forest have been used for classification purpose. This study shows that there is a net improvement in the predictive classification accuracy values, when classification algorithms have been applied on selected features over the complete set of features.
基金The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project Number(RSP-2021/387),King Saud University,Riyadh,Saudi Arabia.
文摘Clinical Study and automatic diagnosis of electrocardiogram(ECG)data always remain a challenge in diagnosing cardiovascular activities.The analysis of ECG data relies on various factors like morphological features,classification techniques,methods or models used to diagnose and its performance improvement.Another crucial factor in themethodology is howto train the model for each patient.Existing approaches use standard training model which faces challenges when training data has variation due to individual patient characteristics resulting in a lower detection accuracy.This paper proposes an adaptive approach to identify performance improvement in building a training model that analyze global trainingmethodology against an individual training methodology and identifying a gap between them.We provide our investigation and comparative study on these methods and model with standard classification techniques with basic morphological features and Heart RateVariability(HRV)thatmay aid real time application.This approach helps in analyzing and evaluating the performance of different techniques and can suggests adoption of a best model identification with efficient technique and efficient attribute set for real-time systems.
文摘In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.
基金supported by the NSFC Major Research Plan--Interpretable and Generalpurpose Next-generation Artificial Intelligence(No.92370205).
文摘Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency.Potentially,the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods.Unlike the broader machine-learning research,which frequently targets tasks within the low-precision regime,our study focuses on the high-precision regime crucial for solving PDEs.In this work,we study this problem from the following aspects:(i)we analyze the coeffcient matrix that arises in the RFM by studying the distribution of singular values;(ii)we investigate whether the continuous training causes the overfitting issue;(ii)we test direct and iterative methods as well as randomized methods for solving the optimization problem.Based on these results,we find that direct methods are superior to other methods if memory is not an issue,while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.