Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-n...Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-nearest neighbor(KNN), and fuzzy KNN(FKNN), have difficulty in accurately predicting the stock trend(uptrend or downtrend) for a given date, a generalized Heronian mean(GHM) based FKNN predictor named GHM-FKNN was proposed.GHM-FKNN combines GHM aggregation function with the ideas of the classical FKNN approach.After evaluation, the comparison results elucidated that GHM-FKNN outperformed the other best existing methods RF, PRF, KNN and FKNN on independent test datasets corresponding to three stocks, namely AAPL, AMZN and NFLX.Compared with RF, PRF, KNN and FKNN, GHM-FKNN achieved the best performance with accuracy of 62.37% for AAPL, 58.25% for AMZN, and 64.10% for NFLX.展开更多
Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats...Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats present in the MIT-BIH Arrhythmia database. We have tested our classifier on ~ 103100 beats for six beat types present in the database. Fuzzy KNN (FKNN) can be implemented very easily but large number of training examples used for classification can be very time consuming and requires large storage space. Hence, we have proposed a time efficient Arif-Fayyaz pruning algorithm especially suitable for FKNN which can maintain good classification accuracy with appropriate retained ratio of training data. By using Arif-Fayyaz pruning algorithm with Fuzzy KNN, we have achieved a beat classification accuracy of 97% and geometric mean of sensitivity of 94.5% with only 19% of the total training examples. The accuracy and sensitivity is comparable to FKNN when all the training data is used. Principal Component Analysis is used to further reduce the dimension of feature space from eleven to six without compromising the accuracy and sensitivity. PFKNN was found to robust against noise present in the ECG data.展开更多
The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-neare...The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-nearest neighbor rule is suggested to classify spatial data.We established consistency and strong consistency of the classifier under mild assumptions.Our main results extend the consistency result in the i.i.d.case to the spatial case.展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with ...Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-展开更多
This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a rand...This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose t...Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose the disc space variation(DSV)fault degree of transformer winding,this paper presents a diagnostic method of winding fault based on the K-Nearest Neighbor(KNN)algorithmand the frequency response analysis(FRA)method.First,a laboratory winding model is used,and DSV faults with four different degrees are achieved by changing disc space of the discs in the winding.Then,a series of FRA tests are conducted to obtain the FRA results and set up the FRA dataset.Second,ten different numerical indices are utilized to obtain features of FRA curves of faulted winding.Third,the 10-fold cross-validation method is employed to determine the optimal k-value of KNN.In addition,to improve the accuracy of the KNN model,a comparative analysis is made between the accuracy of the KNN algorithm and k-value under four distance functions.After getting the most appropriate distance metric and kvalue,the fault classificationmodel based on theKNN and FRA is constructed and it is used to classify the degrees of DSV faults.The identification accuracy rate of the proposed model is up to 98.30%.Finally,the performance of the model is presented by comparing with the support vector machine(SVM),SVM optimized by the particle swarmoptimization(PSO-SVM)method,and randomforest(RF).The results show that the diagnosis accuracy of the proposed model is the highest and the model can be used to accurately diagnose the DSV fault degrees of the winding.展开更多
Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal e...Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.展开更多
Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified...Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.展开更多
As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring s...As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring strategies using geospatial technology.Among statistical methods for mapping biomass,there is a nonparametric approach called k-nearest neighbor(kNN).We compared four variations of distance metrics of the kNN for the spatially-explicit estimation of aboveground biomass in a portion of the Mexican north border of the intertropical zone.Satellite derived,climatic,and topographic predictor variables were combined with the Mexican National Forest Inventory(NFI)data to accomplish the purpose.Performance of distance metrics applied into the kNN algorithm was evaluated using a cross validation leave-one-out technique.The results indicate that the Most Similar Neighbor(MSN)approach maximizes the correlation between predictor and response variables(r=0.9).Our results are in agreement with those reported in the literature.These findings confirm the predictive potential of the MSN approach for mapping forest variables at pixel level under the policy of Reducing Emission from Deforestation and Forest Degradation(REDD+).展开更多
During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and...During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.展开更多
On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feat...On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feature extraction process integrating a forward rolling empirical mode decomposition(EMD)for financial time series signal analysis and principal component analysis(PCA)for the dimension reduction.The information-rich features are extracted then input to a weighted KNN classifier where the features are weighted with PCA loading.Finally,prediction is generated via regression on the selected nearest neighbors.The structure of the model as a whole is original.The test results on real historical data sets confirm the effectiveness of the models for predicting the Chinese stock index,an individual stock,and the EUR/USD exchange rate.展开更多
Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting...Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.展开更多
Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite diffi...Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite difficult to build such systems owing to the lack of data imbalance problems and large number of classes.To solve these issues,we propose a transfer learning-based technique in which we use Efficient-Net,which is pre-trained on ImageNet dataset and fine-tuned on QuT Fish Database,which is a large scale dataset.Furthermore,prior to the activation layer,we use Global Average Pooling(GAP)instead of dense layer with the aim of averaging the results of predictions along with having more information compared to the dense layer.To check the validity of our model,we validate our model on the validation set which achieves satisfactory results.Also,for the localization task,we propose an architecture that consists of localization aware block,which captures localization information for better prediction and residual connections to handle the over-fitting problem.Actually,the residual connections help the layer to combine missing information with the relevant one.In addition,we use class weights and Focal Loss(FL)to handle class imbalance problems along with reducing false predictions.Actually,class weights assign less weights to classes having fewer instances and large weights to classes having more number of instances.During the localization,the qualitative assessment shows that we achieve 57%Mean Intersection Over Union(IoU)on testing data,and the classification results show 75%precision,70%recall,78%accuracy and 74%F1-Score for 468 fish species.展开更多
Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The pr...Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.展开更多
One of the most critical steps in medical health is the proper diagnosis of the disease.Dermatology is one of the most volatile and challenging fields in terms of diagnosis.Dermatologists often require further testing...One of the most critical steps in medical health is the proper diagnosis of the disease.Dermatology is one of the most volatile and challenging fields in terms of diagnosis.Dermatologists often require further testing,review of the patient’s history,and other data to ensure a proper diagnosis.Therefore,finding a method that can guarantee a proper trusted diagnosis quickly is essential.Several approaches have been developed over the years to facilitate the diagnosis based on machine learning.However,the developed systems lack certain properties,such as high accuracy.This study proposes a system developed in MATLAB that can identify skin lesions and classify them as normal or benign.The classification process is effectuated by implementing the K-nearest neighbor(KNN)approach to differentiate between normal skin and malignant skin lesions that imply pathology.KNN is used because it is time efficient and promises highly accurate results.The accuracy of the system reached 98%in classifying skin lesions.展开更多
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC...An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation.展开更多
In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based o...In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach.展开更多
基金Supported by the National Key Research and Development Program (No.2019YFA0707201)the Key Work Program of Institute of Scientific and Technical Information of China (No.ZD2022-01,ZD2023-07)。
文摘Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-nearest neighbor(KNN), and fuzzy KNN(FKNN), have difficulty in accurately predicting the stock trend(uptrend or downtrend) for a given date, a generalized Heronian mean(GHM) based FKNN predictor named GHM-FKNN was proposed.GHM-FKNN combines GHM aggregation function with the ideas of the classical FKNN approach.After evaluation, the comparison results elucidated that GHM-FKNN outperformed the other best existing methods RF, PRF, KNN and FKNN on independent test datasets corresponding to three stocks, namely AAPL, AMZN and NFLX.Compared with RF, PRF, KNN and FKNN, GHM-FKNN achieved the best performance with accuracy of 62.37% for AAPL, 58.25% for AMZN, and 64.10% for NFLX.
文摘Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats present in the MIT-BIH Arrhythmia database. We have tested our classifier on ~ 103100 beats for six beat types present in the database. Fuzzy KNN (FKNN) can be implemented very easily but large number of training examples used for classification can be very time consuming and requires large storage space. Hence, we have proposed a time efficient Arif-Fayyaz pruning algorithm especially suitable for FKNN which can maintain good classification accuracy with appropriate retained ratio of training data. By using Arif-Fayyaz pruning algorithm with Fuzzy KNN, we have achieved a beat classification accuracy of 97% and geometric mean of sensitivity of 94.5% with only 19% of the total training examples. The accuracy and sensitivity is comparable to FKNN when all the training data is used. Principal Component Analysis is used to further reduce the dimension of feature space from eleven to six without compromising the accuracy and sensitivity. PFKNN was found to robust against noise present in the ECG data.
文摘The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-nearest neighbor rule is suggested to classify spatial data.We established consistency and strong consistency of the classifier under mild assumptions.Our main results extend the consistency result in the i.i.d.case to the spatial case.
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.
基金This project was supported by Shanghai Shu Guang Project.
文摘Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-
基金supported by the National Natural Science Foundation of China(Grant No.12002246 and No.52178301)Knowledge Innovation Program of Wuhan(Grant No.2022010801020357)+2 种基金the Science Research Foundation of Wuhan Institute of Technology(Grant No.K2021030)2020 annual Open Fund of Failure Mechanics&Engineering Disaster Prevention and Mitigation,Key Laboratory of Sichuan Province(Sichuan University)(Grant No.2020JDS0022)Open Research Fund Program of Hubei Provincial Key Laboratory of Chemical Equipment Intensification and Intrinsic Safety(Grant No.2019KA03)。
文摘This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
基金supported in part by Shaanxi Natural Science Foundation Project (2023-JC-QN-0438)in part by Fundamental Research Funds for the Central Universities (2452021050).
文摘Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose the disc space variation(DSV)fault degree of transformer winding,this paper presents a diagnostic method of winding fault based on the K-Nearest Neighbor(KNN)algorithmand the frequency response analysis(FRA)method.First,a laboratory winding model is used,and DSV faults with four different degrees are achieved by changing disc space of the discs in the winding.Then,a series of FRA tests are conducted to obtain the FRA results and set up the FRA dataset.Second,ten different numerical indices are utilized to obtain features of FRA curves of faulted winding.Third,the 10-fold cross-validation method is employed to determine the optimal k-value of KNN.In addition,to improve the accuracy of the KNN model,a comparative analysis is made between the accuracy of the KNN algorithm and k-value under four distance functions.After getting the most appropriate distance metric and kvalue,the fault classificationmodel based on theKNN and FRA is constructed and it is used to classify the degrees of DSV faults.The identification accuracy rate of the proposed model is up to 98.30%.Finally,the performance of the model is presented by comparing with the support vector machine(SVM),SVM optimized by the particle swarmoptimization(PSO-SVM)method,and randomforest(RF).The results show that the diagnosis accuracy of the proposed model is the highest and the model can be used to accurately diagnose the DSV fault degrees of the winding.
文摘Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.
文摘Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.
文摘As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring strategies using geospatial technology.Among statistical methods for mapping biomass,there is a nonparametric approach called k-nearest neighbor(kNN).We compared four variations of distance metrics of the kNN for the spatially-explicit estimation of aboveground biomass in a portion of the Mexican north border of the intertropical zone.Satellite derived,climatic,and topographic predictor variables were combined with the Mexican National Forest Inventory(NFI)data to accomplish the purpose.Performance of distance metrics applied into the kNN algorithm was evaluated using a cross validation leave-one-out technique.The results indicate that the Most Similar Neighbor(MSN)approach maximizes the correlation between predictor and response variables(r=0.9).Our results are in agreement with those reported in the literature.These findings confirm the predictive potential of the MSN approach for mapping forest variables at pixel level under the policy of Reducing Emission from Deforestation and Forest Degradation(REDD+).
基金supported by the Innovative Research Groups of National Natural Science Foundation of China(No. 51621092)National Basic Research Program of China ("973" Program, No. 2013CB035904)National Natural Science Foundation of China (No. 51439005)
文摘During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface.
基金supported by the Social Science Foundation of China under Grant No.17BGL231。
文摘On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feature extraction process integrating a forward rolling empirical mode decomposition(EMD)for financial time series signal analysis and principal component analysis(PCA)for the dimension reduction.The information-rich features are extracted then input to a weighted KNN classifier where the features are weighted with PCA loading.Finally,prediction is generated via regression on the selected nearest neighbors.The structure of the model as a whole is original.The test results on real historical data sets confirm the effectiveness of the models for predicting the Chinese stock index,an individual stock,and the EUR/USD exchange rate.
文摘Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.
基金Zamil S.Alzamil would like to thank Deanship of Scientific Research at Majmaah University for supporting this work under Project No.R-2022-172.
文摘Building an automatic fish recognition and detection system for largescale fish classes is helpful for marine researchers and marine scientists because there are large numbers of fish species.However,it is quite difficult to build such systems owing to the lack of data imbalance problems and large number of classes.To solve these issues,we propose a transfer learning-based technique in which we use Efficient-Net,which is pre-trained on ImageNet dataset and fine-tuned on QuT Fish Database,which is a large scale dataset.Furthermore,prior to the activation layer,we use Global Average Pooling(GAP)instead of dense layer with the aim of averaging the results of predictions along with having more information compared to the dense layer.To check the validity of our model,we validate our model on the validation set which achieves satisfactory results.Also,for the localization task,we propose an architecture that consists of localization aware block,which captures localization information for better prediction and residual connections to handle the over-fitting problem.Actually,the residual connections help the layer to combine missing information with the relevant one.In addition,we use class weights and Focal Loss(FL)to handle class imbalance problems along with reducing false predictions.Actually,class weights assign less weights to classes having fewer instances and large weights to classes having more number of instances.During the localization,the qualitative assessment shows that we achieve 57%Mean Intersection Over Union(IoU)on testing data,and the classification results show 75%precision,70%recall,78%accuracy and 74%F1-Score for 468 fish species.
文摘Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.
文摘One of the most critical steps in medical health is the proper diagnosis of the disease.Dermatology is one of the most volatile and challenging fields in terms of diagnosis.Dermatologists often require further testing,review of the patient’s history,and other data to ensure a proper diagnosis.Therefore,finding a method that can guarantee a proper trusted diagnosis quickly is essential.Several approaches have been developed over the years to facilitate the diagnosis based on machine learning.However,the developed systems lack certain properties,such as high accuracy.This study proposes a system developed in MATLAB that can identify skin lesions and classify them as normal or benign.The classification process is effectuated by implementing the K-nearest neighbor(KNN)approach to differentiate between normal skin and malignant skin lesions that imply pathology.KNN is used because it is time efficient and promises highly accurate results.The accuracy of the system reached 98%in classifying skin lesions.
文摘An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation.
基金This work is carried out in the framework of the project supported by the Department of Science and Technology of Kien Giang,Vietnam.The authors would like to thank them for supporting this research。
文摘In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach.