The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy ...The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.展开更多
The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions va...The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions vary widely among individuals,making it challenging to accurately diagnose the disease.This study proposed a deep-learning disease diagnosismodel based onweakly supervised learning and clustering visualization(W_CVNet)that fused classification with segmentation.First,the data were preprocessed.An optimizable weakly supervised segmentation preprocessing method(O-WSSPM)was used to remove redundant data and solve the category imbalance problem.Second,a deep-learning fusion method was used for feature extraction and classification recognition.A dual asymmetric complementary bilinear feature extraction method(D-CBM)was used to fully extract complementary features,which solved the problem of insufficient feature extraction by a single deep learning network.Third,an unsupervised learning method based on Fuzzy C-Means(FCM)clustering was used to segment and visualize COVID-19 lesions enabling physicians to accurately assess lesion distribution and disease severity.In this study,5-fold cross-validation methods were used,and the results showed that the network had an average classification accuracy of 85.8%,outperforming six recent advanced classification models.W_CVNet can effectively help physicians with automated aid in diagnosis to determine if the disease is present and,in the case of COVID-19 patients,to further predict the area of the lesion.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition...Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.展开更多
N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning m...N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.展开更多
Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article ...Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms.展开更多
Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning technique...Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning techniques have emerged as promising tools in stroke medicine,enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches.This abstract provides a comprehensive overview of machine learning’s applications,challenges,and future directions in stroke medicine.Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine.Machine learning models have demonstrated remarkable accuracy in imaging analysis,diagnosing stroke subtypes,risk stratifications,guiding medical treatment,and predicting patient prognosis.Despite the tremendous potential of machine learning in stroke medicine,several challenges must be addressed.These include the need for standardized and interoperable data collection,robust model validation and generalization,and the ethical considerations surrounding privacy and bias.In addition,integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care.Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis,tailored treatment selection,and improved prognostication.Continued research and collaboration among clinicians,researchers,and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care,ultimately leading to enhanced patient outcomes and quality of life.This review aims to summarize all the current implications of machine learning in stroke diagnosis,treatment,and prognostic evaluation.At the same time,another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.展开更多
With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,l...With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance i...In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance in the medical field. In this study, we proposed an intelligent system capable of performing diagnoses for radiologists. The support system is designed to evaluate mammographic images, thereby classifying normal and abnormal patients. The proposed method (DiagBC for Breast Cancer Diagnosis) combines two (2) intelligent unsupervised learning algorithms (the C-Means clustering algorithm and the Gaussian Mixture Model) for the segmentation of medical images and an algorithm for supervised learning (a modified DenseNet) for the diagnosis of breast images. Ultimately, a prototype of the proposed system was implemented for the Magori Polyclinic in Niamey (Niger) making it possible to diagnose (or classify) breast cancer into two (2) classes: the normal class and the abnormal class.展开更多
The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individu...The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individuals and halt further transmission. X-ray imaging of the lungs is one of the most reliable diagnostic tools. Utilizing deep learning, we can train models to recognize the signs of infection, thus aiding in the identification of COVID-19 cases. For our project, we developed a deep learning model utilizing the ResNet50 architecture, pre-trained with ImageNet and CheXNet datasets. We tackled the challenge of an imbalanced dataset, the CoronaHack Chest X-Ray dataset provided by Kaggle, through both binary and multi-class classification approaches. Additionally, we evaluated the performance impact of using Focal loss versus Cross-entropy loss in our model.展开更多
Human action recognition under complex environment is a challenging work.Recently,sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions.The ma...Human action recognition under complex environment is a challenging work.Recently,sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions.The main idea of sparse representation classification is to construct a general classification scheme where the training samples of each class can be considered as the dictionary to express the query class,and the minimal reconstruction error indicates its corresponding class.However,how to learn a discriminative dictionary is still a difficult work.In this work,we make two contributions.First,we build a new and robust human action recognition framework by combining one modified sparse classification model and deep convolutional neural network(CNN)features.Secondly,we construct a novel classification model which consists of the representation-constrained term and the coefficients incoherence term.Experimental results on benchmark datasets show that our modified model can obtain competitive results in comparison to other state-of-the-art models.展开更多
Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this en...Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this end, we propose a training representation based on the dependency paths between entities in adependency tree which we call lexicalized dependency paths (LDPs). We showthat this representation is fast, efficient and transparent. We further propose representations utilizing entity types and its subtypes to refine our model and alleviatethe data sparsity problem. We apply lexicalized dependency paths to supervisedlearning using the ACE corpus and show that it can achieve similar performancelevel to other state-of-the-art methods and even surpass them on severalcategories.展开更多
This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mech...This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mechanism to train the model.In the training process,the model prediction is mapped to the space of value that conforms to the physical mechanism through the projection matrix,and then the model is trained based on the indirect labels.The final prediction result of the model conforms to the physical mechanism between indirect label and label,and also meets the constraints of the indirect label.The present study also develops projection matrix normalization and prediction covariance analysis to ensure that the model can be fully trained.Finally,the effect of the physics-constrained indirect supervised learning is verified based on a well log generation problem.展开更多
In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring...In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.展开更多
Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emer...Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.展开更多
Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented ...Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.展开更多
As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important f...As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important for the operation and maintenance in optical networks.Due to complex relationships among each network element in topology level,each board in network element level,and each component in board level,the con-crete fault location is hard for traditional method.In recent years,machine learning,es-pecially deep learning,has been applied to many complex problems,because machine learning can find potential non-linear mapping from some inputs to the output.In this paper,we introduce supervised machine learning to propose a complete process for fault location.Firstly,we use data preprocessing,data annotation,and data augmenta-tion in order to process original collected data to build a high-quality dataset.Then,two machine learning algorithms(convolutional neural networks and deep neural networks)are applied on the dataset.The evaluation on commercial optical networks shows that this process helps improve the quality of dataset,and two algorithms perform well on fault location.展开更多
A novel algorithm is presented for supervised inductive learning by integrating a genetic algorithm with hot'tom-up induction process.The hybrid learning algorithm has been implemented in C on a personal computer(...A novel algorithm is presented for supervised inductive learning by integrating a genetic algorithm with hot'tom-up induction process.The hybrid learning algorithm has been implemented in C on a personal computer(386DX/40).The performance of the algorithm has been evaluated by applying it to 11-multiplexer problem and the results show that the algorithm's accuracy is higher than the others[5,12, 13].展开更多
文摘The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.
基金funded by the Open Foundation of Anhui EngineeringResearch Center of Intelligent Perception and Elderly Care,Chuzhou University(No.2022OPA03)the Higher EducationNatural Science Foundation of Anhui Province(No.KJ2021B01)and the Innovation Team Projects of Universities in Guangdong(No.2022KCXTD057).
文摘The coronavirus disease 2019(COVID-19)has severely disrupted both human life and the health care system.Timely diagnosis and treatment have become increasingly important;however,the distribution and size of lesions vary widely among individuals,making it challenging to accurately diagnose the disease.This study proposed a deep-learning disease diagnosismodel based onweakly supervised learning and clustering visualization(W_CVNet)that fused classification with segmentation.First,the data were preprocessed.An optimizable weakly supervised segmentation preprocessing method(O-WSSPM)was used to remove redundant data and solve the category imbalance problem.Second,a deep-learning fusion method was used for feature extraction and classification recognition.A dual asymmetric complementary bilinear feature extraction method(D-CBM)was used to fully extract complementary features,which solved the problem of insufficient feature extraction by a single deep learning network.Third,an unsupervised learning method based on Fuzzy C-Means(FCM)clustering was used to segment and visualize COVID-19 lesions enabling physicians to accurately assess lesion distribution and disease severity.In this study,5-fold cross-validation methods were used,and the results showed that the network had an average classification accuracy of 85.8%,outperforming six recent advanced classification models.W_CVNet can effectively help physicians with automated aid in diagnosis to determine if the disease is present and,in the case of COVID-19 patients,to further predict the area of the lesion.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
文摘Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.
文摘N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.
文摘Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms.
文摘Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning techniques have emerged as promising tools in stroke medicine,enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches.This abstract provides a comprehensive overview of machine learning’s applications,challenges,and future directions in stroke medicine.Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine.Machine learning models have demonstrated remarkable accuracy in imaging analysis,diagnosing stroke subtypes,risk stratifications,guiding medical treatment,and predicting patient prognosis.Despite the tremendous potential of machine learning in stroke medicine,several challenges must be addressed.These include the need for standardized and interoperable data collection,robust model validation and generalization,and the ethical considerations surrounding privacy and bias.In addition,integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care.Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis,tailored treatment selection,and improved prognostication.Continued research and collaboration among clinicians,researchers,and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care,ultimately leading to enhanced patient outcomes and quality of life.This review aims to summarize all the current implications of machine learning in stroke diagnosis,treatment,and prognostic evaluation.At the same time,another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.
文摘With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
文摘In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance in the medical field. In this study, we proposed an intelligent system capable of performing diagnoses for radiologists. The support system is designed to evaluate mammographic images, thereby classifying normal and abnormal patients. The proposed method (DiagBC for Breast Cancer Diagnosis) combines two (2) intelligent unsupervised learning algorithms (the C-Means clustering algorithm and the Gaussian Mixture Model) for the segmentation of medical images and an algorithm for supervised learning (a modified DenseNet) for the diagnosis of breast images. Ultimately, a prototype of the proposed system was implemented for the Magori Polyclinic in Niamey (Niger) making it possible to diagnose (or classify) breast cancer into two (2) classes: the normal class and the abnormal class.
文摘The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individuals and halt further transmission. X-ray imaging of the lungs is one of the most reliable diagnostic tools. Utilizing deep learning, we can train models to recognize the signs of infection, thus aiding in the identification of COVID-19 cases. For our project, we developed a deep learning model utilizing the ResNet50 architecture, pre-trained with ImageNet and CheXNet datasets. We tackled the challenge of an imbalanced dataset, the CoronaHack Chest X-Ray dataset provided by Kaggle, through both binary and multi-class classification approaches. Additionally, we evaluated the performance impact of using Focal loss versus Cross-entropy loss in our model.
基金This research was funded by the National Natural Science Foundation of China(21878124,31771680 and 61773182).
文摘Human action recognition under complex environment is a challenging work.Recently,sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions.The main idea of sparse representation classification is to construct a general classification scheme where the training samples of each class can be considered as the dictionary to express the query class,and the minimal reconstruction error indicates its corresponding class.However,how to learn a discriminative dictionary is still a difficult work.In this work,we make two contributions.First,we build a new and robust human action recognition framework by combining one modified sparse classification model and deep convolutional neural network(CNN)features.Secondly,we construct a novel classification model which consists of the representation-constrained term and the coefficients incoherence term.Experimental results on benchmark datasets show that our modified model can obtain competitive results in comparison to other state-of-the-art models.
文摘Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this end, we propose a training representation based on the dependency paths between entities in adependency tree which we call lexicalized dependency paths (LDPs). We showthat this representation is fast, efficient and transparent. We further propose representations utilizing entity types and its subtypes to refine our model and alleviatethe data sparsity problem. We apply lexicalized dependency paths to supervisedlearning using the ACE corpus and show that it can achieve similar performancelevel to other state-of-the-art methods and even surpass them on severalcategories.
基金partially funded by the National Natural Science Foundation of China (Grants 51520105005 and U1663208)
文摘This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mechanism to train the model.In the training process,the model prediction is mapped to the space of value that conforms to the physical mechanism through the projection matrix,and then the model is trained based on the indirect labels.The final prediction result of the model conforms to the physical mechanism between indirect label and label,and also meets the constraints of the indirect label.The present study also develops projection matrix normalization and prediction covariance analysis to ensure that the model can be fully trained.Finally,the effect of the physics-constrained indirect supervised learning is verified based on a well log generation problem.
基金Supported by the National Natural Science Foundation of China(61273160)the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A)
文摘In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.
基金Supported by the National High Technology Research and Development Programme of China (No. 2005AA121620, 2006AA01Z232)the Zhejiang Provincial Natural Science Foundation of China (No. Y1080935 )the Research Innovation Program for Graduate Students in Jiangsu Province (No. CX07B_ 110zF)
文摘Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.
基金Supported by the National Natural Science Foundation of China (No. 30570485)the Shanghai "Chen Guang" Project (No. 09CG69).
文摘Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.
文摘As the fundamental infrastructure of the Internet,the optical network carries a great amount of Internet traffic.There would be great financial losses if some faults happen.Therefore,fault location is very important for the operation and maintenance in optical networks.Due to complex relationships among each network element in topology level,each board in network element level,and each component in board level,the con-crete fault location is hard for traditional method.In recent years,machine learning,es-pecially deep learning,has been applied to many complex problems,because machine learning can find potential non-linear mapping from some inputs to the output.In this paper,we introduce supervised machine learning to propose a complete process for fault location.Firstly,we use data preprocessing,data annotation,and data augmenta-tion in order to process original collected data to build a high-quality dataset.Then,two machine learning algorithms(convolutional neural networks and deep neural networks)are applied on the dataset.The evaluation on commercial optical networks shows that this process helps improve the quality of dataset,and two algorithms perform well on fault location.
文摘A novel algorithm is presented for supervised inductive learning by integrating a genetic algorithm with hot'tom-up induction process.The hybrid learning algorithm has been implemented in C on a personal computer(386DX/40).The performance of the algorithm has been evaluated by applying it to 11-multiplexer problem and the results show that the algorithm's accuracy is higher than the others[5,12, 13].