The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived ...We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived information enhances reservoir characterization. Stochastic inversion and Bayesian classification are powerful tools because they permit addressing the uncertainties in the model. We used the ES-MDA algorithm to achieve the realizations equivalent to the percentiles P10, P50, and P90 of acoustic impedance, a novel method for acoustic inversion in presalt. The facies were divided into five: reservoir 1,reservoir 2, tight carbonates, clayey rocks, and igneous rocks. To deal with the overlaps in acoustic impedance values of facies, we included geological information using a priori probability, indicating that structural highs are reservoir-dominated. To illustrate our approach, we conducted porosity modeling using facies-related rock-physics models for rock-physics inversion in an area with a well drilled in a coquina bank and evaluated the thickness and extension of an igneous intrusion near the carbonate-salt interface. The modeled porosity and the classified seismic facies are in good agreement with the ones observed in the wells. Notably, the coquinas bank presents an improvement in the porosity towards the top. The a priori probability model was crucial for limiting the clayey rocks to the structural lows. In Well B, the hit rate of the igneous rock in the three scenarios is higher than 60%, showing an excellent thickness-prediction capability.展开更多
Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods...Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods still need to solve this problem despite the numerous available approaches. Precise analysis of Magnetic Resonance Imaging (MRI) is crucial for detecting, segmenting, and classifying brain tumours in medical diagnostics. Magnetic Resonance Imaging is a vital component in medical diagnosis, and it requires precise, efficient, careful, efficient, and reliable image analysis techniques. The authors developed a Deep Learning (DL) fusion model to classify brain tumours reliably. Deep Learning models require large amounts of training data to achieve good results, so the researchers utilised data augmentation techniques to increase the dataset size for training models. VGG16, ResNet50, and convolutional deep belief networks networks extracted deep features from MRI images. Softmax was used as the classifier, and the training set was supplemented with intentionally created MRI images of brain tumours in addition to the genuine ones. The features of two DL models were combined in the proposed model to generate a fusion model, which significantly increased classification accuracy. An openly accessible dataset from the internet was used to test the model's performance, and the experimental results showed that the proposed fusion model achieved a classification accuracy of 98.98%. Finally, the results were compared with existing methods, and the proposed model outperformed them significantly.展开更多
Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucia...Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucial source for public health surveillance,offering valuable insights into public reactions during the COVID-19 pandemic.This study aims to leverage a range of machine learning techniques to extract pivotal themes and facilitate text classification on a dataset of COVID-19 outbreak-related tweets.Diverse topic modeling approaches have been employed to extract pertinent themes and subsequently form a dataset for training text classification models.An assessment of coherence metrics revealed that the Gibbs Sampling Dirichlet Mixture Model(GSDMM),which utilizes trigram and bag-of-words(BOW)feature extraction,outperformed Non-negative Matrix Factorization(NMF),Latent Dirichlet Allocation(LDA),and a hybrid strategy involving Bidirectional Encoder Representations from Transformers(BERT)combined with LDA and K-means to pinpoint significant themes within the dataset.Among the models assessed for text clustering,the utilization of LDA,either as a clustering model or for feature extraction combined with BERT for K-means,resulted in higher coherence scores,consistent with human ratings,signifying their efficacy.In particular,LDA,notably in conjunction with trigram representation and BOW,demonstrated superior performance.This underscores the suitability of LDA for conducting topic modeling,given its proficiency in capturing intricate textual relationships.In the context of text classification,models such as Linear Support Vector Classification(LSVC),Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BiLSTM),Convolutional Neural Network with BiLSTM(CNN-BiLSTM),and BERT have shown outstanding performance,achieving accuracy and weighted F1-Score scores exceeding 80%.These results significantly surpassed other models,such as Multinomial Naive Bayes(MNB),Linear Support Vector Machine(LSVM),and Logistic Regression(LR),which achieved scores in the range of 60 to 70 percent.展开更多
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
During the condition monitoring of a planetary gearbox, features are extracted from raw data for a fault diagnosis.However, different features have different sensitivity for identifying different fault types, and thus...During the condition monitoring of a planetary gearbox, features are extracted from raw data for a fault diagnosis.However, different features have different sensitivity for identifying different fault types, and thus, the selection of a sensitive feature subset from an entire feature set and retaining as much of the class discriminatory information as possible has a directly effect on the accuracy of the classification results. In this paper, an improved hybrid feature selection technique(IHFST) that combines a distance evaluation technique(DET), Pearson’s correlation analysis, and an ad hoc technique is proposed. In IHFST, a temporary feature subset without irrelevant features is first selected according to the distance evaluation criterion of DET, and the Pearson’s correlation analysis and ad hoc technique are then employed to find and remove redundant features in the temporary feature subset, respectively, and hence,a sensitive feature subset without irrelevant or redundant features is selected from the entire feature set. Further, the k-means clustering method is applied to classify the different kinds of health conditions. The effectiveness of the proposed method was validated through several experiments carried out on a planetary gearbox with incipient cracks seeded in the tooth root of the sun gear, planet gear, and ring gear. The results show that the proposed method can successfully distinguish the different health conditions of a planetary gearbox, and achieves a better classification performance than other methods. This study proposes a sensitive feature subset selection method that achieves an obvious improvement in terms of the accuracy of the fault classification.展开更多
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa...We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.展开更多
Laser-induced breakdown spectroscopy(LIBS) is a new technology suitable for classification of various materials. This paper proposes a hybrid classification scheme for coal, municipal sludge and biomass by using LIBS ...Laser-induced breakdown spectroscopy(LIBS) is a new technology suitable for classification of various materials. This paper proposes a hybrid classification scheme for coal, municipal sludge and biomass by using LIBS combined with K-means and support vector machine(SVM)algorithm. In the study, 10 samples were classified in 3 groups without supervision by K-means clustering, then a further supervised classification of 6 kinds of biomass samples by SVM was carried out. The results show that the comprehensive accuracy of the hybrid classification model is over 98%. In comparison with the single SVM classification model, the hybrid classification model can save 58.92% of operation time while guaranteeing the accuracy. The results demonstrate that the hybrid classification model is able to make an efficient, fast and accurate classification of coal, municipal sludge and biomass, furthermore, it is precise for the detection of various kinds of biomass fuel.展开更多
The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the...The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the atmospheric circulation field and temperature field data of ERA-Interim for every six hours,the NCCV processes during the early summer(June)seasons from 1979 to 2018 were objectively identified.Then,the NCCV processes were classified using a machine learning method(k-means)according to the characteristic parameters of the activity path information.The rationality of the classification results was verified from two aspects,as follows:(1)the atmospheric circulation configuration of the NCCV on various paths;and(2)its influences on the climate conditions in the NEC.The obtained results showed that the activity paths of the NCCV could be divided into four types according to such characteristics as the generation origin,movement direction,and movement velocity of the NCCV.These included the generation-eastward movement type in the east of the Mongolia Plateau(eastward movement type or type A);generation-southeast longdistance movement type in the upstream of the Lena River(southeast long-distance movement type or type B);generationeastward less-movement type near Lake Baikal(eastward less-movement type or type C);and the generation-southward less-movement type in eastern Siberia(southward less-movement type or type D).There were obvious differences observed in the atmospheric circulation configuration and the climate impact of the NCCV on the four above-mentioned types of paths,which indicated that the classification results were reasonable.展开更多
Based on the Joint Typhoon Warning Center(JTWC) best-track dataset between 1965 and 2009 and the characteristic parameters including tropical cyclone(TC) position,intensity,path length and direction,a method for objec...Based on the Joint Typhoon Warning Center(JTWC) best-track dataset between 1965 and 2009 and the characteristic parameters including tropical cyclone(TC) position,intensity,path length and direction,a method for objective classification of the Northwestern Pacific tropical cyclone tracks is established by using k-means Clustering.The TC lifespan,energy,active season and landfall probability of seven clusters of tropical cyclone tracks are comparatively analyzed.The characteristics of these parameters are quite different among different tropical cyclone track clusters.From the trend of the past two decades,the frequency of the western recurving cluster(accounting for 21.3% of the total) increased,and the lifespan elongated slightly,which differs from the other clusters.The annual variation of the Power Dissipation Index(PDI) of most clusters mainly depended on the TC intensity and frequency.However,the annual variation of the PDI in the northwestern moving then recurving cluster and the pelagic west-northwest moving cluster mainly depended on the frequency.展开更多
The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class cla...The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.展开更多
Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Wes...Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Western Iraq as an example,a new reservoir classification and discrimination method is established by using the K-means clustering method and the Bayesian discrimination method.These methods are applied to non-cored wells to calculate the discrimination accuracy of the reservoir type,and thus the main reasons for low accuracy of reservoir discrimination are clarified.The results show that the discrimination accuracy of reservoir type based on K-means clustering and Bayesian stepwise discrimination is strongly related to the accuracy of the core data.The discrimination accuracy rate of TypeⅠ,TypeⅡ,and TypeⅤreservoirs is found to be significantly higher than that of TypeⅢand TypeⅣreservoirs using the method of combining K-means clustering and Bayesian theory based on logging data.Although the recognition accuracy of the new methodology for the TypeⅣreservoir is low,with average accuracy the new method has reached more than 82%in the entire study area,which lays a good foundation for rapid and accurate discrimination of reservoir types and the fine evaluation of a reservoir.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first ...This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...展开更多
In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clusteri...In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clustering with cloud model. Firstly, taking data source as a complex network, after the topography of network is obtained, the cloud model of each node data is determined by fuzzy analytic hierarchy process (AHP). Secondly, by calculating expectation, entropy and hyper entropy of the cloud model, comprehensive coupling strength is got and then it is regarded as the edge weight of topography. Finally, distribution curve is obtained by iterating the phase of each node by means of phase synchronization model. Thus classification of data source is completed. This method can not only provide convenience for storage, cleaning and compression of data, but also improve the efficiency of data analysis.展开更多
Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and te...Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and technology systematically. According to these investigations, a coalbed methane reservoir can be defined: 'a coal seam that contains some coalbed methane and is isolated from other fluid units is called a coalbed methane reservoir'. On the basis of anatomization, analysis, and comparison of the typical coalbed methane reservoirs, coalbed methane reservoirs can be divided into two classes: the hydrodynamic sealing coalbed methane reservoirs and the self-sealing coalbed methane reservoirs. The former can be further divided into two sub-classes: the hydrodynamic capping coalbed methane reservoirs, which can be divided into five types and the hydrodynamic driving coalbed methane reservoirs, which can be divided into three types. The latter can be divided into three types. Currently, hydrodynamic sealing reservoirs are the main target for coalbed methane exploration and development; self-sealing reservoirs are unsuitable for coalbed methane exploration and development, but they are closely related with coal mine gas hazards. Finally, a model for hydrodynamic sealing coalbed methane reservoirs is established.展开更多
A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a force...A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.展开更多
The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpr...The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpretation of remote sensing from TM Landsat images is extremely important. However, most of the types of wetlands can not be divided each other due to the similarity and the illegibility of the wetland spectrum shown in TM images. Special disposals to remote sensing images include the spectrum enhancement of wetland information, the pseudo color composite of TM images of different bands and the algebra enhancement of TM images. By this way some kinds of wetlands such as Sparganium stoloniferum and Bolboschoenus maritimus can be identified. But in many cases, these methods are still insufficient because of the noise brought from the atmosphere transportation and so on. The physical features of wetlands reflecting the diversification of spectrum information of wetlands, which include the spatial temporal characteristics of the wetlands distribution, the landscape differences of wetlands from season to season, the growing environment and the vertical structure of wetlands vegetation and so on, must be taken into consideration. Besides these, the artificial alteration to spatial structure of wetlands such as the exploitation of some types of them can be also used as important symbols of wetlands identification from remote sensing images. On the basis of the above geographics analysis, a set of wetlands classification models of remote sensing could be established, and many types of wetlands such as paddy field, reed swamp, peat mire, meadow, CAREX marsh and paludification meadow and so on, will be distinguished consequently. All the ways of geographical analysis and model establishment will be given in detail in this article.展开更多
In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based o...In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.展开更多
A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the p...A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the project. First, the major factors of rockburst, such as the maximum tangential stress of the cavern wall σθ, uniaxial compressive strength σc, uniaxial tensile strength or, and the elastic energy index of rock Wet, were taken into account in the analysis. Three factors, Stress coefficient σθ/σc, rock brittleness coefficient σc/σt, and elastic energy index Wet, were defined as the criterion indices for rockburst prediction in the proposed model. After training and testing of 12 sets of measured data, the discriminant functions of FDA were solved, and the ratio of misdiscrimina- tion is zero. Moreover, the proposed model was used to predict rockbursts of Qinling tunnel along Xi'an-Ankang railway. The results show that three forecast results are identical with the actual situation. Therefore, the prediction accuracy of the FDA model is acceptable.展开更多
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金Equinor for financing the R&D projectthe Institute of Science and Technology of Petroleum Geophysics of Brazil for supporting this research。
文摘We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived information enhances reservoir characterization. Stochastic inversion and Bayesian classification are powerful tools because they permit addressing the uncertainties in the model. We used the ES-MDA algorithm to achieve the realizations equivalent to the percentiles P10, P50, and P90 of acoustic impedance, a novel method for acoustic inversion in presalt. The facies were divided into five: reservoir 1,reservoir 2, tight carbonates, clayey rocks, and igneous rocks. To deal with the overlaps in acoustic impedance values of facies, we included geological information using a priori probability, indicating that structural highs are reservoir-dominated. To illustrate our approach, we conducted porosity modeling using facies-related rock-physics models for rock-physics inversion in an area with a well drilled in a coquina bank and evaluated the thickness and extension of an igneous intrusion near the carbonate-salt interface. The modeled porosity and the classified seismic facies are in good agreement with the ones observed in the wells. Notably, the coquinas bank presents an improvement in the porosity towards the top. The a priori probability model was crucial for limiting the clayey rocks to the structural lows. In Well B, the hit rate of the igneous rock in the three scenarios is higher than 60%, showing an excellent thickness-prediction capability.
基金Ministry of Education,Youth and Sports of the Chezk Republic,Grant/Award Numbers:SP2023/039,SP2023/042the European Union under the REFRESH,Grant/Award Number:CZ.10.03.01/00/22_003/0000048。
文摘Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods still need to solve this problem despite the numerous available approaches. Precise analysis of Magnetic Resonance Imaging (MRI) is crucial for detecting, segmenting, and classifying brain tumours in medical diagnostics. Magnetic Resonance Imaging is a vital component in medical diagnosis, and it requires precise, efficient, careful, efficient, and reliable image analysis techniques. The authors developed a Deep Learning (DL) fusion model to classify brain tumours reliably. Deep Learning models require large amounts of training data to achieve good results, so the researchers utilised data augmentation techniques to increase the dataset size for training models. VGG16, ResNet50, and convolutional deep belief networks networks extracted deep features from MRI images. Softmax was used as the classifier, and the training set was supplemented with intentionally created MRI images of brain tumours in addition to the genuine ones. The features of two DL models were combined in the proposed model to generate a fusion model, which significantly increased classification accuracy. An openly accessible dataset from the internet was used to test the model's performance, and the experimental results showed that the proposed fusion model achieved a classification accuracy of 98.98%. Finally, the results were compared with existing methods, and the proposed model outperformed them significantly.
文摘Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucial source for public health surveillance,offering valuable insights into public reactions during the COVID-19 pandemic.This study aims to leverage a range of machine learning techniques to extract pivotal themes and facilitate text classification on a dataset of COVID-19 outbreak-related tweets.Diverse topic modeling approaches have been employed to extract pertinent themes and subsequently form a dataset for training text classification models.An assessment of coherence metrics revealed that the Gibbs Sampling Dirichlet Mixture Model(GSDMM),which utilizes trigram and bag-of-words(BOW)feature extraction,outperformed Non-negative Matrix Factorization(NMF),Latent Dirichlet Allocation(LDA),and a hybrid strategy involving Bidirectional Encoder Representations from Transformers(BERT)combined with LDA and K-means to pinpoint significant themes within the dataset.Among the models assessed for text clustering,the utilization of LDA,either as a clustering model or for feature extraction combined with BERT for K-means,resulted in higher coherence scores,consistent with human ratings,signifying their efficacy.In particular,LDA,notably in conjunction with trigram representation and BOW,demonstrated superior performance.This underscores the suitability of LDA for conducting topic modeling,given its proficiency in capturing intricate textual relationships.In the context of text classification,models such as Linear Support Vector Classification(LSVC),Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BiLSTM),Convolutional Neural Network with BiLSTM(CNN-BiLSTM),and BERT have shown outstanding performance,achieving accuracy and weighted F1-Score scores exceeding 80%.These results significantly surpassed other models,such as Multinomial Naive Bayes(MNB),Linear Support Vector Machine(LSVM),and Logistic Regression(LR),which achieved scores in the range of 60 to 70 percent.
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
基金Supported by National Natural Science Foundation of China(Grant No.51475053)
文摘During the condition monitoring of a planetary gearbox, features are extracted from raw data for a fault diagnosis.However, different features have different sensitivity for identifying different fault types, and thus, the selection of a sensitive feature subset from an entire feature set and retaining as much of the class discriminatory information as possible has a directly effect on the accuracy of the classification results. In this paper, an improved hybrid feature selection technique(IHFST) that combines a distance evaluation technique(DET), Pearson’s correlation analysis, and an ad hoc technique is proposed. In IHFST, a temporary feature subset without irrelevant features is first selected according to the distance evaluation criterion of DET, and the Pearson’s correlation analysis and ad hoc technique are then employed to find and remove redundant features in the temporary feature subset, respectively, and hence,a sensitive feature subset without irrelevant or redundant features is selected from the entire feature set. Further, the k-means clustering method is applied to classify the different kinds of health conditions. The effectiveness of the proposed method was validated through several experiments carried out on a planetary gearbox with incipient cracks seeded in the tooth root of the sun gear, planet gear, and ring gear. The results show that the proposed method can successfully distinguish the different health conditions of a planetary gearbox, and achieves a better classification performance than other methods. This study proposes a sensitive feature subset selection method that achieves an obvious improvement in terms of the accuracy of the fault classification.
文摘We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.
基金supported by National Natural Science Foundation of China (No. 51 676 073)the Guangdong Province Train High-Level Personnel Special Support Program (No. 2014TQ01N334)+1 种基金the Science and Technology Project of Guangdong Province (No. 2015A020215005)the Guangdong Province Key Laboratory of Efficient and Clean Energy Utilization (No. 2013A061401005)
文摘Laser-induced breakdown spectroscopy(LIBS) is a new technology suitable for classification of various materials. This paper proposes a hybrid classification scheme for coal, municipal sludge and biomass by using LIBS combined with K-means and support vector machine(SVM)algorithm. In the study, 10 samples were classified in 3 groups without supervision by K-means clustering, then a further supervised classification of 6 kinds of biomass samples by SVM was carried out. The results show that the comprehensive accuracy of the hybrid classification model is over 98%. In comparison with the single SVM classification model, the hybrid classification model can save 58.92% of operation time while guaranteeing the accuracy. The results demonstrate that the hybrid classification model is able to make an efficient, fast and accurate classification of coal, municipal sludge and biomass, furthermore, it is precise for the detection of various kinds of biomass fuel.
基金This research was jointly supported by the National Natural Science Foundation of China(Grant No.42005037)the Liaoning Provincial Natural Science Foundation Project(PhD Start-up Research Fund 2019-BS-214),the Special Scientific Research Project for the Forecaster(Grant No.CMAYBY2018-018)+2 种基金a Key Technical Project of Liaoning Meteorological Bureau(Grant No.LNGJ201903)the National Key Research and Development Project(Grant No.2018YFC1505601)the Open Foundation Project of the Institute of Atmospheric Environment,China Meteorological Administration(Grant Nos.2020SYIAE08 and 2020SYIAEZD5).
文摘The classification of the Northeast China Cold Vortex(NCCV)activity paths is an important way to analyze its characteristics in detail.Based on the daily precipitation data of the northeastern China(NEC)region,and the atmospheric circulation field and temperature field data of ERA-Interim for every six hours,the NCCV processes during the early summer(June)seasons from 1979 to 2018 were objectively identified.Then,the NCCV processes were classified using a machine learning method(k-means)according to the characteristic parameters of the activity path information.The rationality of the classification results was verified from two aspects,as follows:(1)the atmospheric circulation configuration of the NCCV on various paths;and(2)its influences on the climate conditions in the NEC.The obtained results showed that the activity paths of the NCCV could be divided into four types according to such characteristics as the generation origin,movement direction,and movement velocity of the NCCV.These included the generation-eastward movement type in the east of the Mongolia Plateau(eastward movement type or type A);generation-southeast longdistance movement type in the upstream of the Lena River(southeast long-distance movement type or type B);generationeastward less-movement type near Lake Baikal(eastward less-movement type or type C);and the generation-southward less-movement type in eastern Siberia(southward less-movement type or type D).There were obvious differences observed in the atmospheric circulation configuration and the climate impact of the NCCV on the four above-mentioned types of paths,which indicated that the classification results were reasonable.
基金National Basic Research Program of China(973 Program)(2015CB453200),2012CB955903)National Natural Science Foundation of China(41575083,41575108)Jiangsu Education Science Foundation(13KJA170002)
文摘Based on the Joint Typhoon Warning Center(JTWC) best-track dataset between 1965 and 2009 and the characteristic parameters including tropical cyclone(TC) position,intensity,path length and direction,a method for objective classification of the Northwestern Pacific tropical cyclone tracks is established by using k-means Clustering.The TC lifespan,energy,active season and landfall probability of seven clusters of tropical cyclone tracks are comparatively analyzed.The characteristics of these parameters are quite different among different tropical cyclone track clusters.From the trend of the past two decades,the frequency of the western recurving cluster(accounting for 21.3% of the total) increased,and the lifespan elongated slightly,which differs from the other clusters.The annual variation of the Power Dissipation Index(PDI) of most clusters mainly depended on the TC intensity and frequency.However,the annual variation of the PDI in the northwestern moving then recurving cluster and the pelagic west-northwest moving cluster mainly depended on the frequency.
基金supported by the National Natural Science Foundation of China(61703131 61703129+1 种基金 61701148 61703128)
文摘The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.
基金funded by the National Key Research and Development Program(Grant No.2018YFC0807804-2)。
文摘Reservoir classification is a key link in reservoir evaluation.However,traditional manual means are inefficient,subjective,and classification standards are not uniform.Therefore,taking the Mishrif Formation of the Western Iraq as an example,a new reservoir classification and discrimination method is established by using the K-means clustering method and the Bayesian discrimination method.These methods are applied to non-cored wells to calculate the discrimination accuracy of the reservoir type,and thus the main reasons for low accuracy of reservoir discrimination are clarified.The results show that the discrimination accuracy of reservoir type based on K-means clustering and Bayesian stepwise discrimination is strongly related to the accuracy of the core data.The discrimination accuracy rate of TypeⅠ,TypeⅡ,and TypeⅤreservoirs is found to be significantly higher than that of TypeⅢand TypeⅣreservoirs using the method of combining K-means clustering and Bayesian theory based on logging data.Although the recognition accuracy of the new methodology for the TypeⅣreservoir is low,with average accuracy the new method has reached more than 82%in the entire study area,which lays a good foundation for rapid and accurate discrimination of reservoir types and the fine evaluation of a reservoir.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
文摘This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...
基金National Natural Science Foundation of China(No.61171057,No.61503345)Science Foundation for North University of China(No.110246)+1 种基金Specialized Research Fund for Doctoral Program of Higher Education of China(No.20121420110004)International Office of Shanxi Province Education Department of China,and Basic Research Project in Shanxi Province(Young Foundation)
文摘In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clustering with cloud model. Firstly, taking data source as a complex network, after the topography of network is obtained, the cloud model of each node data is determined by fuzzy analytic hierarchy process (AHP). Secondly, by calculating expectation, entropy and hyper entropy of the cloud model, comprehensive coupling strength is got and then it is regarded as the edge weight of topography. Finally, distribution curve is obtained by iterating the phase of each node by means of phase synchronization model. Thus classification of data source is completed. This method can not only provide convenience for storage, cleaning and compression of data, but also improve the efficiency of data analysis.
基金We wish to thank the Ministry of Science an d Technology of China for its finan cial support of the“Project 973”(No.2002CB211705)the Science and Technology Admi nistration of Henan Province.
文摘Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and technology systematically. According to these investigations, a coalbed methane reservoir can be defined: 'a coal seam that contains some coalbed methane and is isolated from other fluid units is called a coalbed methane reservoir'. On the basis of anatomization, analysis, and comparison of the typical coalbed methane reservoirs, coalbed methane reservoirs can be divided into two classes: the hydrodynamic sealing coalbed methane reservoirs and the self-sealing coalbed methane reservoirs. The former can be further divided into two sub-classes: the hydrodynamic capping coalbed methane reservoirs, which can be divided into five types and the hydrodynamic driving coalbed methane reservoirs, which can be divided into three types. The latter can be divided into three types. Currently, hydrodynamic sealing reservoirs are the main target for coalbed methane exploration and development; self-sealing reservoirs are unsuitable for coalbed methane exploration and development, but they are closely related with coal mine gas hazards. Finally, a model for hydrodynamic sealing coalbed methane reservoirs is established.
基金supported by the Ministry of Trade,Industry & Energy(MOTIE,Korea) under Industrial Technology Innovation Program (No.10063424,'development of distant speech recognition and multi-task dialog processing technologies for in-door conversational robots')
文摘A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.
文摘The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpretation of remote sensing from TM Landsat images is extremely important. However, most of the types of wetlands can not be divided each other due to the similarity and the illegibility of the wetland spectrum shown in TM images. Special disposals to remote sensing images include the spectrum enhancement of wetland information, the pseudo color composite of TM images of different bands and the algebra enhancement of TM images. By this way some kinds of wetlands such as Sparganium stoloniferum and Bolboschoenus maritimus can be identified. But in many cases, these methods are still insufficient because of the noise brought from the atmosphere transportation and so on. The physical features of wetlands reflecting the diversification of spectrum information of wetlands, which include the spatial temporal characteristics of the wetlands distribution, the landscape differences of wetlands from season to season, the growing environment and the vertical structure of wetlands vegetation and so on, must be taken into consideration. Besides these, the artificial alteration to spatial structure of wetlands such as the exploitation of some types of them can be also used as important symbols of wetlands identification from remote sensing images. On the basis of the above geographics analysis, a set of wetlands classification models of remote sensing could be established, and many types of wetlands such as paddy field, reed swamp, peat mire, meadow, CAREX marsh and paludification meadow and so on, will be distinguished consequently. All the ways of geographical analysis and model establishment will be given in detail in this article.
基金Supported by Zhejiang Provincial Philosophy and Social Science Planning Zhijiang Youth Project of China(Grant No.16ZJQN017YB)Ministry of Education of China,Humanities and Social Science Projects(Grant No.19YJA910006)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LY20A010019)Fundamental Research Funds for the Provincial Universities of Zhejiang(Grant No.GK199900299012-204)Zhejiang Provincial Statistical Science Research Base Project of China(Grant No.19TJJD08)
文摘In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.
基金Supported by the National 11th Five-Year Science and Technology Supporting Plan of China(2006BAB02A02)Central South University Innovation funded projects (2009ssxt230, 2009ssxt234)
文摘A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the project. First, the major factors of rockburst, such as the maximum tangential stress of the cavern wall σθ, uniaxial compressive strength σc, uniaxial tensile strength or, and the elastic energy index of rock Wet, were taken into account in the analysis. Three factors, Stress coefficient σθ/σc, rock brittleness coefficient σc/σt, and elastic energy index Wet, were defined as the criterion indices for rockburst prediction in the proposed model. After training and testing of 12 sets of measured data, the discriminant functions of FDA were solved, and the ratio of misdiscrimina- tion is zero. Moreover, the proposed model was used to predict rockbursts of Qinling tunnel along Xi'an-Ankang railway. The results show that three forecast results are identical with the actual situation. Therefore, the prediction accuracy of the FDA model is acceptable.