Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition...Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article ...Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms.展开更多
In order to solve the problem of automatic defect detection and process control in the welding and arc additive process,the paper monitors the current,voltage,audio,and other data during the welding process and extrac...In order to solve the problem of automatic defect detection and process control in the welding and arc additive process,the paper monitors the current,voltage,audio,and other data during the welding process and extracts the minimum value,standard deviation,deviation from the voltage and current data.It extracts spectral features such as root mean square,spectral centroid,and zero-crossing rate from audio data,fuses the features extracted from multiple sensor signals,and establishes multiple machine learning supervised and unsupervised models.They are used to detect abnormalities in the welding process.The experimental results show that the established multiple machine learning models have high accuracy,among which the supervised learning model,the balanced accuracy of Ada boost is 0.957,and the unsupervised learning model Isolation Forest has a balanced accuracy of 0.909.展开更多
Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this en...Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this end, we propose a training representation based on the dependency paths between entities in adependency tree which we call lexicalized dependency paths (LDPs). We showthat this representation is fast, efficient and transparent. We further propose representations utilizing entity types and its subtypes to refine our model and alleviatethe data sparsity problem. We apply lexicalized dependency paths to supervisedlearning using the ACE corpus and show that it can achieve similar performancelevel to other state-of-the-art methods and even surpass them on severalcategories.展开更多
This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mech...This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mechanism to train the model.In the training process,the model prediction is mapped to the space of value that conforms to the physical mechanism through the projection matrix,and then the model is trained based on the indirect labels.The final prediction result of the model conforms to the physical mechanism between indirect label and label,and also meets the constraints of the indirect label.The present study also develops projection matrix normalization and prediction covariance analysis to ensure that the model can be fully trained.Finally,the effect of the physics-constrained indirect supervised learning is verified based on a well log generation problem.展开更多
Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented ...Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.展开更多
The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduce...The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduced to extract damage-sensitive features from auto-regressive models.This approach sets out to improve current feature extraction techniques in the context of time series modeling.The coefficients and residuals of the AR model obtained from the proposed approach are selected as the main features and are applied to the proposed supervised learning classifiers that are categorized as coefficient-based and residual-based classifiers.These classifiers compute the relative errors in the extracted features between the undamaged and damaged states.Eventually,the abilities of the proposed methods to localize and quantify single and multiple damage scenarios are verified by applying experimental data for a laboratory frame and a four-story steel structure.Comparative analyses are performed to validate the superiority of the proposed methods over some existing techniques.Results show that the proposed classifiers,with the aid of extracted features from the proposed feature extraction approach,are able to locate and quantify damage;however,the residual-based classifiers yield better results than the coefficient-based classifiers.Moreover,these methods are superior to some classical techniques.展开更多
A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input d...A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.展开更多
This study proposes an architecture for the prediction of extremist human behaviour from projected suicide bombings.By linking‘dots’of police data comprising scattered information of people,groups,logistics,location...This study proposes an architecture for the prediction of extremist human behaviour from projected suicide bombings.By linking‘dots’of police data comprising scattered information of people,groups,logistics,locations,communication,and spatiotemporal characters on different social media groups,the proposed architecture will spawn beneficial information.This useful information will,in turn,help the police both in predicting potential terrorist events and in investigating previous events.Furthermore,this architecture will aid in the identification of criminals and their associates and handlers.Terrorism is psychological warfare,which,in the broadest sense,can be defined as the utilisation of deliberate violence for economic,political or religious purposes.In this study,a supervised learning-based approach was adopted to develop the proposed architecture.The dataset was prepared from the suicide bomb blast data of Pakistan obtained from the South Asia Terrorism Portal(SATP).As the proposed architecture was simulated,the supervised learning-based classifiers na飗e Bayes and Hoeffding Tree reached 72.17%accuracy.One of the additional benefits this study offers is the ability to predict the target audience of potential suicide bomb blasts,which may be used to eliminate future threats or,at least,minimise the number of casualties and other property losses.展开更多
Due to the dynamic nature and node mobility,assuring the security of Mobile Ad-hoc Networks(MANET)is one of the difficult and challenging tasks today.In MANET,the Intrusion Detection System(IDS)is crucial because it a...Due to the dynamic nature and node mobility,assuring the security of Mobile Ad-hoc Networks(MANET)is one of the difficult and challenging tasks today.In MANET,the Intrusion Detection System(IDS)is crucial because it aids in the identification and detection of malicious attacks that impair the network’s regular operation.Different machine learning and deep learning methodologies are used for this purpose in the conventional works to ensure increased security of MANET.However,it still has significant flaws,including increased algorithmic complexity,lower system performance,and a higher rate of misclassification.Therefore,the goal of this paper is to create an intelligent IDS framework for significantly enhancing MANET security through the use of deep learning models.Here,the min-max normalization model is applied to preprocess the given cyber-attack datasets for normalizing the attributes or fields,which increases the overall intrusion detection performance of classifier.Then,a novel Adaptive Marine Predator Optimization Algorithm(AOMA)is implemented to choose the optimal features for improving the speed and intrusion detection performance of classifier.Moreover,the Deep Supervise Learning Classification(DSLC)mechanism is utilized to predict and categorize the type of intrusion based on proper learning and training operations.During evaluation,the performance and results of the proposed AOMA-DSLC based IDS methodology is validated and compared using various performance measures and benchmarking datasets.展开更多
Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning technique...Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning techniques have emerged as promising tools in stroke medicine,enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches.This abstract provides a comprehensive overview of machine learning’s applications,challenges,and future directions in stroke medicine.Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine.Machine learning models have demonstrated remarkable accuracy in imaging analysis,diagnosing stroke subtypes,risk stratifications,guiding medical treatment,and predicting patient prognosis.Despite the tremendous potential of machine learning in stroke medicine,several challenges must be addressed.These include the need for standardized and interoperable data collection,robust model validation and generalization,and the ethical considerations surrounding privacy and bias.In addition,integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care.Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis,tailored treatment selection,and improved prognostication.Continued research and collaboration among clinicians,researchers,and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care,ultimately leading to enhanced patient outcomes and quality of life.This review aims to summarize all the current implications of machine learning in stroke diagnosis,treatment,and prognostic evaluation.At the same time,another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.展开更多
In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance i...In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance in the medical field. In this study, we proposed an intelligent system capable of performing diagnoses for radiologists. The support system is designed to evaluate mammographic images, thereby classifying normal and abnormal patients. The proposed method (DiagBC for Breast Cancer Diagnosis) combines two (2) intelligent unsupervised learning algorithms (the C-Means clustering algorithm and the Gaussian Mixture Model) for the segmentation of medical images and an algorithm for supervised learning (a modified DenseNet) for the diagnosis of breast images. Ultimately, a prototype of the proposed system was implemented for the Magori Polyclinic in Niamey (Niger) making it possible to diagnose (or classify) breast cancer into two (2) classes: the normal class and the abnormal class.展开更多
Transition prediction has always been a frontier issue in the field of aerodynamics.A supervised learning model with probability interpretation for transition judgment based on experimental data was developed in this ...Transition prediction has always been a frontier issue in the field of aerodynamics.A supervised learning model with probability interpretation for transition judgment based on experimental data was developed in this paper.It solved the shortcomings of the point detection method in the experiment,that which was often only one transition point could be obtained,and comparison of multi-point data was necessary.First,the Variable-Interval Time Average(VITA)method was used to transform the fluctuating pressure signal measured on the airfoil surface into a sequence of states which was described by Markov chain model.Second,a feature vector consisting of one-step transition matrix and its stationary distribution was extracted.Then,the Hidden Markov Model(HMM)was used to pre-classify the feature vectors marked using the traditional Root Mean Square(RMS)criteria.Finally,a classification model with probability interpretation was established,and the cross-validation method was used for model validation.The research results show that the developed model is effective and reliable,and it has strong Reynolds number generalization ability.The developed model was theoretically analyzed in depth,and the effect of parameters on the model was studied in detail.Compared with the traditional RMS criterion,a reasonable transition zone can be obtained using the developed classification model.In addition,the developed model does not require comparison of multi-point data.The developed supervised learning model provides new ideas for the transition detection in flight experiments and other experiments.展开更多
Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous human...Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.展开更多
The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individu...The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individuals and halt further transmission. X-ray imaging of the lungs is one of the most reliable diagnostic tools. Utilizing deep learning, we can train models to recognize the signs of infection, thus aiding in the identification of COVID-19 cases. For our project, we developed a deep learning model utilizing the ResNet50 architecture, pre-trained with ImageNet and CheXNet datasets. We tackled the challenge of an imbalanced dataset, the CoronaHack Chest X-Ray dataset provided by Kaggle, through both binary and multi-class classification approaches. Additionally, we evaluated the performance impact of using Focal loss versus Cross-entropy loss in our model.展开更多
N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning m...N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.展开更多
With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,l...With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.展开更多
Rule-based autonomous driving systems may suffer from increased complexity with large-scale intercoupled rules,so many researchers are exploring learning-based approaches.Reinforcement learning(RL)has been applied in ...Rule-based autonomous driving systems may suffer from increased complexity with large-scale intercoupled rules,so many researchers are exploring learning-based approaches.Reinforcement learning(RL)has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems.However,poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system.RL training requires extensive training data before the model achieves reasonable performance,making an RL-based model inapplicable in a real-world setting,particularly when data are expensive.We propose an asynchronous supervised learning(ASL)method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings.Specifically,prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel,on multiple driving demonstration data sets.After pre-training,the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit.The presented pre-training method is evaluated on the race car simulator,TORCS(The Open Racing Car Simulator),to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage.In addition,a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment.Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.展开更多
Satellite image classification is crucial in various applications such as urban planning,environmental monitoring,and land use analysis.In this study,the authors present a comparative analysis of different supervised ...Satellite image classification is crucial in various applications such as urban planning,environmental monitoring,and land use analysis.In this study,the authors present a comparative analysis of different supervised and unsupervised learning methods for satellite image classification,focusing on a case study in Casablanca using Landsat 8 imagery.This research aims to identify the most effective machine-learning approach for accurately classifying land cover in an urban environment.The methodology used consists of the pre-processing of Landsat imagery data from Casablanca city,the authors extract relevant features and partition them into training and test sets,and then use random forest(RF),SVM(support vector machine),classification,and regression tree(CART),gradient tree boost(GTB),decision tree(DT),and minimum distance(MD)algorithms.Through a series of experiments,the authors evaluate the performance of each machine learning method in terms of accuracy,and Kappa coefficient.This work shows that random forest is the best-performing algorithm,with an accuracy of 95.42%and 0.94 Kappa coefficient.The authors discuss the factors of their performance,including data characteristics,accurate selection,and model influencing.展开更多
文摘Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
文摘Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms.
文摘In order to solve the problem of automatic defect detection and process control in the welding and arc additive process,the paper monitors the current,voltage,audio,and other data during the welding process and extracts the minimum value,standard deviation,deviation from the voltage and current data.It extracts spectral features such as root mean square,spectral centroid,and zero-crossing rate from audio data,fuses the features extracted from multiple sensor signals,and establishes multiple machine learning supervised and unsupervised models.They are used to detect abnormalities in the welding process.The experimental results show that the established multiple machine learning models have high accuracy,among which the supervised learning model,the balanced accuracy of Ada boost is 0.957,and the unsupervised learning model Isolation Forest has a balanced accuracy of 0.909.
文摘Log-linear models and more recently neural network models used forsupervised relation extraction requires substantial amounts of training data andtime, limiting the portability to new relations and domains. To this end, we propose a training representation based on the dependency paths between entities in adependency tree which we call lexicalized dependency paths (LDPs). We showthat this representation is fast, efficient and transparent. We further propose representations utilizing entity types and its subtypes to refine our model and alleviatethe data sparsity problem. We apply lexicalized dependency paths to supervisedlearning using the ACE corpus and show that it can achieve similar performancelevel to other state-of-the-art methods and even surpass them on severalcategories.
基金partially funded by the National Natural Science Foundation of China (Grants 51520105005 and U1663208)
文摘This study proposes a supervised learning method that does not rely on labels.We use variables associated with the label as indirect labels,and construct an indirect physics-constrained loss based on the physical mechanism to train the model.In the training process,the model prediction is mapped to the space of value that conforms to the physical mechanism through the projection matrix,and then the model is trained based on the indirect labels.The final prediction result of the model conforms to the physical mechanism between indirect label and label,and also meets the constraints of the indirect label.The present study also develops projection matrix normalization and prediction covariance analysis to ensure that the model can be fully trained.Finally,the effect of the physics-constrained indirect supervised learning is verified based on a well log generation problem.
基金Supported by the National Natural Science Foundation of China (No. 30570485)the Shanghai "Chen Guang" Project (No. 09CG69).
文摘Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.
文摘The motivation for this article is to propose new damage classifiers based on a supervised learning problem for locating and quantifying damage.A new feature extraction approach using time series analysis is introduced to extract damage-sensitive features from auto-regressive models.This approach sets out to improve current feature extraction techniques in the context of time series modeling.The coefficients and residuals of the AR model obtained from the proposed approach are selected as the main features and are applied to the proposed supervised learning classifiers that are categorized as coefficient-based and residual-based classifiers.These classifiers compute the relative errors in the extracted features between the undamaged and damaged states.Eventually,the abilities of the proposed methods to localize and quantify single and multiple damage scenarios are verified by applying experimental data for a laboratory frame and a four-story steel structure.Comparative analyses are performed to validate the superiority of the proposed methods over some existing techniques.Results show that the proposed classifiers,with the aid of extracted features from the proposed feature extraction approach,are able to locate and quantify damage;however,the residual-based classifiers yield better results than the coefficient-based classifiers.Moreover,these methods are superior to some classical techniques.
基金supported by Chiang Mai University Research Fund under the contract number T-M5744
文摘A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.
文摘This study proposes an architecture for the prediction of extremist human behaviour from projected suicide bombings.By linking‘dots’of police data comprising scattered information of people,groups,logistics,locations,communication,and spatiotemporal characters on different social media groups,the proposed architecture will spawn beneficial information.This useful information will,in turn,help the police both in predicting potential terrorist events and in investigating previous events.Furthermore,this architecture will aid in the identification of criminals and their associates and handlers.Terrorism is psychological warfare,which,in the broadest sense,can be defined as the utilisation of deliberate violence for economic,political or religious purposes.In this study,a supervised learning-based approach was adopted to develop the proposed architecture.The dataset was prepared from the suicide bomb blast data of Pakistan obtained from the South Asia Terrorism Portal(SATP).As the proposed architecture was simulated,the supervised learning-based classifiers na飗e Bayes and Hoeffding Tree reached 72.17%accuracy.One of the additional benefits this study offers is the ability to predict the target audience of potential suicide bomb blasts,which may be used to eliminate future threats or,at least,minimise the number of casualties and other property losses.
文摘Due to the dynamic nature and node mobility,assuring the security of Mobile Ad-hoc Networks(MANET)is one of the difficult and challenging tasks today.In MANET,the Intrusion Detection System(IDS)is crucial because it aids in the identification and detection of malicious attacks that impair the network’s regular operation.Different machine learning and deep learning methodologies are used for this purpose in the conventional works to ensure increased security of MANET.However,it still has significant flaws,including increased algorithmic complexity,lower system performance,and a higher rate of misclassification.Therefore,the goal of this paper is to create an intelligent IDS framework for significantly enhancing MANET security through the use of deep learning models.Here,the min-max normalization model is applied to preprocess the given cyber-attack datasets for normalizing the attributes or fields,which increases the overall intrusion detection performance of classifier.Then,a novel Adaptive Marine Predator Optimization Algorithm(AOMA)is implemented to choose the optimal features for improving the speed and intrusion detection performance of classifier.Moreover,the Deep Supervise Learning Classification(DSLC)mechanism is utilized to predict and categorize the type of intrusion based on proper learning and training operations.During evaluation,the performance and results of the proposed AOMA-DSLC based IDS methodology is validated and compared using various performance measures and benchmarking datasets.
文摘Stroke is a leading cause of disability and mortality worldwide,necessitating the development of advanced technologies to improve its diagnosis,treatment,and patient outcomes.In recent years,machine learning techniques have emerged as promising tools in stroke medicine,enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches.This abstract provides a comprehensive overview of machine learning’s applications,challenges,and future directions in stroke medicine.Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine.Machine learning models have demonstrated remarkable accuracy in imaging analysis,diagnosing stroke subtypes,risk stratifications,guiding medical treatment,and predicting patient prognosis.Despite the tremendous potential of machine learning in stroke medicine,several challenges must be addressed.These include the need for standardized and interoperable data collection,robust model validation and generalization,and the ethical considerations surrounding privacy and bias.In addition,integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care.Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis,tailored treatment selection,and improved prognostication.Continued research and collaboration among clinicians,researchers,and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care,ultimately leading to enhanced patient outcomes and quality of life.This review aims to summarize all the current implications of machine learning in stroke diagnosis,treatment,and prognostic evaluation.At the same time,another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.
文摘In many fields, particularly that of health, the diagnosis of diseases is a very difficult task to carry out. Therefore, early detection of diseases using artificial intelligence tools can be of paramount importance in the medical field. In this study, we proposed an intelligent system capable of performing diagnoses for radiologists. The support system is designed to evaluate mammographic images, thereby classifying normal and abnormal patients. The proposed method (DiagBC for Breast Cancer Diagnosis) combines two (2) intelligent unsupervised learning algorithms (the C-Means clustering algorithm and the Gaussian Mixture Model) for the segmentation of medical images and an algorithm for supervised learning (a modified DenseNet) for the diagnosis of breast images. Ultimately, a prototype of the proposed system was implemented for the Magori Polyclinic in Niamey (Niger) making it possible to diagnose (or classify) breast cancer into two (2) classes: the normal class and the abnormal class.
基金supported by the National Key Laboratory of Science and Technology on Aerodynamic Design and Research Foundation, China
文摘Transition prediction has always been a frontier issue in the field of aerodynamics.A supervised learning model with probability interpretation for transition judgment based on experimental data was developed in this paper.It solved the shortcomings of the point detection method in the experiment,that which was often only one transition point could be obtained,and comparison of multi-point data was necessary.First,the Variable-Interval Time Average(VITA)method was used to transform the fluctuating pressure signal measured on the airfoil surface into a sequence of states which was described by Markov chain model.Second,a feature vector consisting of one-step transition matrix and its stationary distribution was extracted.Then,the Hidden Markov Model(HMM)was used to pre-classify the feature vectors marked using the traditional Root Mean Square(RMS)criteria.Finally,a classification model with probability interpretation was established,and the cross-validation method was used for model validation.The research results show that the developed model is effective and reliable,and it has strong Reynolds number generalization ability.The developed model was theoretically analyzed in depth,and the effect of parameters on the model was studied in detail.Compared with the traditional RMS criterion,a reasonable transition zone can be obtained using the developed classification model.In addition,the developed model does not require comparison of multi-point data.The developed supervised learning model provides new ideas for the transition detection in flight experiments and other experiments.
基金the National Natural Science Foundation of China(42001408,61806097).
文摘Significant advancements have been achieved in road surface extraction based on high-resolution remote sensingimage processing. Most current methods rely on fully supervised learning, which necessitates enormous humaneffort to label the image. Within this field, other research endeavors utilize weakly supervised methods. Theseapproaches aim to reduce the expenses associated with annotation by leveraging sparsely annotated data, such asscribbles. This paper presents a novel technique called a weakly supervised network using scribble-supervised andedge-mask (WSSE-net). This network is a three-branch network architecture, whereby each branch is equippedwith a distinct decoder module dedicated to road extraction tasks. One of the branches is dedicated to generatingedge masks using edge detection algorithms and optimizing road edge details. The other two branches supervise themodel’s training by employing scribble labels and spreading scribble information throughout the image. To addressthe historical flaw that created pseudo-labels that are not updated with network training, we use mixup to blendprediction results dynamically and continually update new pseudo-labels to steer network training. Our solutiondemonstrates efficient operation by simultaneously considering both edge-mask aid and dynamic pseudo-labelsupport. The studies are conducted on three separate road datasets, which consist primarily of high-resolutionremote-sensing satellite photos and drone images. The experimental findings suggest that our methodologyperforms better than advanced scribble-supervised approaches and specific traditional fully supervised methods.
文摘The COVID-19 pandemic has had a widespread negative impact globally. It shares symptoms with other respiratory illnesses such as pneumonia and influenza, making rapid and accurate diagnosis essential to treat individuals and halt further transmission. X-ray imaging of the lungs is one of the most reliable diagnostic tools. Utilizing deep learning, we can train models to recognize the signs of infection, thus aiding in the identification of COVID-19 cases. For our project, we developed a deep learning model utilizing the ResNet50 architecture, pre-trained with ImageNet and CheXNet datasets. We tackled the challenge of an imbalanced dataset, the CoronaHack Chest X-Ray dataset provided by Kaggle, through both binary and multi-class classification approaches. Additionally, we evaluated the performance impact of using Focal loss versus Cross-entropy loss in our model.
文摘N-11-azaartemisinins potentially active against Plasmodium falciparum are designed by combining molecular electrostatic potential (MEP), ligand-receptor interaction, and models built with supervised machine learning methods (PCA, HCA, KNN, SIMCA, and SDA). The optimization of molecular structures was performed using the B3LYP/6-31G* approach. MEP maps and ligand-receptor interactions were used to investigate key structural features required for biological activities and likely interactions between N-11-azaartemisinins and heme, respectively. The supervised machine learning methods allowed the separation of the investigated compounds into two classes: cha and cla, with the properties ε<sub>LUMO+1</sub> (one level above lowest unoccupied molecular orbital energy), d(C<sub>6</sub>-C<sub>5</sub>) (distance between C<sub>6</sub> and C<sub>5</sub> atoms in ligands), and TSA (total surface area) responsible for the classification. The insights extracted from the investigation developed and the chemical intuition enabled the design of sixteen new N-11-azaartemisinins (prediction set), moreover, models built with supervised machine learning methods were applied to this prediction set. The result of this application showed twelve new promising N-11-azaartemisinins for synthesis and biological evaluation.
文摘With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.
基金Project supported by the National Natural Science Foundation of China(Nos.61672082 and 61822101)the Beijing Municipal Natural Science Foundation,China(No.4181002)the Beihang University Innovation and Practice Fund for Graduate,China(No.YCSJ-02-2018-05)。
文摘Rule-based autonomous driving systems may suffer from increased complexity with large-scale intercoupled rules,so many researchers are exploring learning-based approaches.Reinforcement learning(RL)has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems.However,poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system.RL training requires extensive training data before the model achieves reasonable performance,making an RL-based model inapplicable in a real-world setting,particularly when data are expensive.We propose an asynchronous supervised learning(ASL)method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings.Specifically,prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel,on multiple driving demonstration data sets.After pre-training,the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit.The presented pre-training method is evaluated on the race car simulator,TORCS(The Open Racing Car Simulator),to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage.In addition,a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment.Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.
文摘Satellite image classification is crucial in various applications such as urban planning,environmental monitoring,and land use analysis.In this study,the authors present a comparative analysis of different supervised and unsupervised learning methods for satellite image classification,focusing on a case study in Casablanca using Landsat 8 imagery.This research aims to identify the most effective machine-learning approach for accurately classifying land cover in an urban environment.The methodology used consists of the pre-processing of Landsat imagery data from Casablanca city,the authors extract relevant features and partition them into training and test sets,and then use random forest(RF),SVM(support vector machine),classification,and regression tree(CART),gradient tree boost(GTB),decision tree(DT),and minimum distance(MD)algorithms.Through a series of experiments,the authors evaluate the performance of each machine learning method in terms of accuracy,and Kappa coefficient.This work shows that random forest is the best-performing algorithm,with an accuracy of 95.42%and 0.94 Kappa coefficient.The authors discuss the factors of their performance,including data characteristics,accurate selection,and model influencing.