CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from lar...In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from large collections of data,and particularly documents,has become more necessary and challenging.Text clustering is such a technique;it consists in dividing a set of text documents into clusters(groups),so that documents within the same cluster are closely related,whereas documents in different clusters are as different as possible.Clustering depends on measuring the content(i.e.,words)of a document in terms of relevance.Nevertheless,as documents usually contain a large number of words,some of them may be irrelevant to the topic under consideration or redundant.This can confuse and complicate the clustering process and make it less accurate.Accordingly,feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features.In this study,we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features.The proposed approach is based on two metaheuristic algorithms:a genetic algorithm(GA)and a shuffled frog-leaping algorithm(SFLA).The GA performs feature selection,and the SFLA performs clustering.To evaluate its effectiveness,the proposed approach was tested on a well-known text document dataset:the“20Newsgroup”dataset from the University of California Irvine Machine Learning Repository.Overall,after multiple experiments were compared and analyzed,it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering,compared with classical K-means clustering.Nevertheless,this improvement requires longer computational time.展开更多
It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especial...It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task.Therefore,two unsupervised spectral feature selection algorithms are proposed in this paper.They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation,so as to detect the global optimal feature clusters as far as possible.Then two feature ranking techniques,including cosine-similarity-based feature ranking and entropy-based feature ranking,are proposed,so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built.The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods,and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods.The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison,especially the one based on cosine similarity feature ranking technique.The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best.The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data,such that the AI system built on them would be reliable and explainable.It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.展开更多
Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better...Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly.Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.Methods This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods.The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal.Agglomerative hierarchy clustering was used to identify molecular subtypes,with a P-value-based approach for feature selection.The performance of the model was evaluated using various classifiers including multilayer perceptron(MLP).Results The proposed methodology outperformed conventional methods,with the MLP classifier achieving the highest accuracy of 89%after feature selection.The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.Conclusion This method could aid in developing tailored therapeutic strategies for CRC patients,although there is a need for further validation and evaluation of its clinical significance.展开更多
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is ...Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection(FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes(NB),instance based(IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.展开更多
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
基金This research was supported by a grant from the Research Center of the Center for Female Scientific and Medical Colleges Deanship of Scientific Research,King Saud University.
文摘In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from large collections of data,and particularly documents,has become more necessary and challenging.Text clustering is such a technique;it consists in dividing a set of text documents into clusters(groups),so that documents within the same cluster are closely related,whereas documents in different clusters are as different as possible.Clustering depends on measuring the content(i.e.,words)of a document in terms of relevance.Nevertheless,as documents usually contain a large number of words,some of them may be irrelevant to the topic under consideration or redundant.This can confuse and complicate the clustering process and make it less accurate.Accordingly,feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features.In this study,we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features.The proposed approach is based on two metaheuristic algorithms:a genetic algorithm(GA)and a shuffled frog-leaping algorithm(SFLA).The GA performs feature selection,and the SFLA performs clustering.To evaluate its effectiveness,the proposed approach was tested on a well-known text document dataset:the“20Newsgroup”dataset from the University of California Irvine Machine Learning Repository.Overall,after multiple experiments were compared and analyzed,it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering,compared with classical K-means clustering.Nevertheless,this improvement requires longer computational time.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62076159,12031010,61673251,and 61771297)was also supported by the Fundamental Research Funds for the Central Universities(GK202105003)+1 种基金the Natural Science Basic Research Program of Shaanxi Province of China(2022JM334)the Innovation Funds of Graduate Programs at Shaanxi Normal University(2015CXS028 and 2016CSY009).
文摘It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task.Therefore,two unsupervised spectral feature selection algorithms are proposed in this paper.They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation,so as to detect the global optimal feature clusters as far as possible.Then two feature ranking techniques,including cosine-similarity-based feature ranking and entropy-based feature ranking,are proposed,so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built.The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods,and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods.The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison,especially the one based on cosine similarity feature ranking technique.The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best.The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data,such that the AI system built on them would be reliable and explainable.It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.
文摘Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly.Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.Methods This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods.The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal.Agglomerative hierarchy clustering was used to identify molecular subtypes,with a P-value-based approach for feature selection.The performance of the model was evaluated using various classifiers including multilayer perceptron(MLP).Results The proposed methodology outperformed conventional methods,with the MLP classifier achieving the highest accuracy of 89%after feature selection.The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.Conclusion This method could aid in developing tailored therapeutic strategies for CRC patients,although there is a need for further validation and evaluation of its clinical significance.
文摘Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection(FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes(NB),instance based(IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.