We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived ...We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived information enhances reservoir characterization. Stochastic inversion and Bayesian classification are powerful tools because they permit addressing the uncertainties in the model. We used the ES-MDA algorithm to achieve the realizations equivalent to the percentiles P10, P50, and P90 of acoustic impedance, a novel method for acoustic inversion in presalt. The facies were divided into five: reservoir 1,reservoir 2, tight carbonates, clayey rocks, and igneous rocks. To deal with the overlaps in acoustic impedance values of facies, we included geological information using a priori probability, indicating that structural highs are reservoir-dominated. To illustrate our approach, we conducted porosity modeling using facies-related rock-physics models for rock-physics inversion in an area with a well drilled in a coquina bank and evaluated the thickness and extension of an igneous intrusion near the carbonate-salt interface. The modeled porosity and the classified seismic facies are in good agreement with the ones observed in the wells. Notably, the coquinas bank presents an improvement in the porosity towards the top. The a priori probability model was crucial for limiting the clayey rocks to the structural lows. In Well B, the hit rate of the igneous rock in the three scenarios is higher than 60%, showing an excellent thickness-prediction capability.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucia...Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucial source for public health surveillance,offering valuable insights into public reactions during the COVID-19 pandemic.This study aims to leverage a range of machine learning techniques to extract pivotal themes and facilitate text classification on a dataset of COVID-19 outbreak-related tweets.Diverse topic modeling approaches have been employed to extract pertinent themes and subsequently form a dataset for training text classification models.An assessment of coherence metrics revealed that the Gibbs Sampling Dirichlet Mixture Model(GSDMM),which utilizes trigram and bag-of-words(BOW)feature extraction,outperformed Non-negative Matrix Factorization(NMF),Latent Dirichlet Allocation(LDA),and a hybrid strategy involving Bidirectional Encoder Representations from Transformers(BERT)combined with LDA and K-means to pinpoint significant themes within the dataset.Among the models assessed for text clustering,the utilization of LDA,either as a clustering model or for feature extraction combined with BERT for K-means,resulted in higher coherence scores,consistent with human ratings,signifying their efficacy.In particular,LDA,notably in conjunction with trigram representation and BOW,demonstrated superior performance.This underscores the suitability of LDA for conducting topic modeling,given its proficiency in capturing intricate textual relationships.In the context of text classification,models such as Linear Support Vector Classification(LSVC),Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BiLSTM),Convolutional Neural Network with BiLSTM(CNN-BiLSTM),and BERT have shown outstanding performance,achieving accuracy and weighted F1-Score scores exceeding 80%.These results significantly surpassed other models,such as Multinomial Naive Bayes(MNB),Linear Support Vector Machine(LSVM),and Logistic Regression(LR),which achieved scores in the range of 60 to 70 percent.展开更多
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa...We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.展开更多
Net Primary Productivity (NPP) is an important parameter, which is closely connected with global climate change, the global carbon balance and cycle. The study of climate- vegetation interaction is the basis for res...Net Primary Productivity (NPP) is an important parameter, which is closely connected with global climate change, the global carbon balance and cycle. The study of climate- vegetation interaction is the basis for research on the responses of terrestrial ecosystemto global change and mainly comprises two important components: climate vegetation classification and the NPP of the natural vegetation. Comparing NPP estimated from the classification indices-based model with NPP derived from measurements at 3767 sites in China indicated that the classification indices-based model was capable of estimating large scale NPP. Annual cumulative temperature above 0~C and a moisture index, two main factors affecting NPP, were spatially plotted with the ArcGIS grid tool based on measured data in 2348 meteorological stations from 1961 to 2006. The distribution of NPP for potential vegetation classes under present climate conditions was simulated by the classification indices-based model. The model estimated the total NPP of potential terrestrial vegetation of China to fluctuate between 1.93 and 4.54 Pg C year-1. It pro- vides a reliable means for scaling-up from site to regional scales, and the findings could potentially favor China's position in reducing global warming gases as outlined in the Kyoto Protocol in order to fulfill China's commitment of reducing greenhouse gases.展开更多
Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classif...Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.展开更多
Recently,computer aided diagnosis(CAD)model becomes an effective tool for decision making in healthcare sector.The advances in computer vision and artificial intelligence(AI)techniques have resulted in the effective d...Recently,computer aided diagnosis(CAD)model becomes an effective tool for decision making in healthcare sector.The advances in computer vision and artificial intelligence(AI)techniques have resulted in the effective design of CAD models,which enables to detection of the existence of diseases using various imaging modalities.Oral cancer(OC)has commonly occurred in head and neck globally.Earlier identification of OC enables to improve survival rate and reduce mortality rate.Therefore,the design of CAD model for OC detection and classification becomes essential.Therefore,this study introduces a novel Computer Aided Diagnosis for OC using Sailfish Optimization with Fusion based Classification(CADOC-SFOFC)model.The proposed CADOC-SFOFC model determines the existence of OC on the medical images.To accomplish this,a fusion based feature extraction process is carried out by the use of VGGNet-16 and Residual Network(ResNet)model.Besides,feature vectors are fused and passed into the extreme learning machine(ELM)model for classification process.Moreover,SFO algorithm is utilized for effective parameter selection of the ELM model,consequently resulting in enhanced performance.The experimental analysis of the CADOC-SFOFC model was tested on Kaggle dataset and the results reported the betterment of the CADOC-SFOFC model over the compared methods with maximum accuracy of 98.11%.Therefore,the CADOC-SFOFC model has maximum potential as an inexpensive and non-invasive tool which supports screening process and enhances the detection efficiency.展开更多
Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many pati...Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a rescue.The input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’experience.Therefore,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest x-rays.In medical classifications,deep convolution neural networks are commonly used.This research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and Pneumonia.The MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR images.The evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and f1-score.The model effectively decreases training loss while increasing accuracy.The findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.展开更多
This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first ...This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...展开更多
In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clusteri...In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clustering with cloud model. Firstly, taking data source as a complex network, after the topography of network is obtained, the cloud model of each node data is determined by fuzzy analytic hierarchy process (AHP). Secondly, by calculating expectation, entropy and hyper entropy of the cloud model, comprehensive coupling strength is got and then it is regarded as the edge weight of topography. Finally, distribution curve is obtained by iterating the phase of each node by means of phase synchronization model. Thus classification of data source is completed. This method can not only provide convenience for storage, cleaning and compression of data, but also improve the efficiency of data analysis.展开更多
Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and te...Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and technology systematically. According to these investigations, a coalbed methane reservoir can be defined: 'a coal seam that contains some coalbed methane and is isolated from other fluid units is called a coalbed methane reservoir'. On the basis of anatomization, analysis, and comparison of the typical coalbed methane reservoirs, coalbed methane reservoirs can be divided into two classes: the hydrodynamic sealing coalbed methane reservoirs and the self-sealing coalbed methane reservoirs. The former can be further divided into two sub-classes: the hydrodynamic capping coalbed methane reservoirs, which can be divided into five types and the hydrodynamic driving coalbed methane reservoirs, which can be divided into three types. The latter can be divided into three types. Currently, hydrodynamic sealing reservoirs are the main target for coalbed methane exploration and development; self-sealing reservoirs are unsuitable for coalbed methane exploration and development, but they are closely related with coal mine gas hazards. Finally, a model for hydrodynamic sealing coalbed methane reservoirs is established.展开更多
A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a force...A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.展开更多
The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpr...The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpretation of remote sensing from TM Landsat images is extremely important. However, most of the types of wetlands can not be divided each other due to the similarity and the illegibility of the wetland spectrum shown in TM images. Special disposals to remote sensing images include the spectrum enhancement of wetland information, the pseudo color composite of TM images of different bands and the algebra enhancement of TM images. By this way some kinds of wetlands such as Sparganium stoloniferum and Bolboschoenus maritimus can be identified. But in many cases, these methods are still insufficient because of the noise brought from the atmosphere transportation and so on. The physical features of wetlands reflecting the diversification of spectrum information of wetlands, which include the spatial temporal characteristics of the wetlands distribution, the landscape differences of wetlands from season to season, the growing environment and the vertical structure of wetlands vegetation and so on, must be taken into consideration. Besides these, the artificial alteration to spatial structure of wetlands such as the exploitation of some types of them can be also used as important symbols of wetlands identification from remote sensing images. On the basis of the above geographics analysis, a set of wetlands classification models of remote sensing could be established, and many types of wetlands such as paddy field, reed swamp, peat mire, meadow, CAREX marsh and paludification meadow and so on, will be distinguished consequently. All the ways of geographical analysis and model establishment will be given in detail in this article.展开更多
A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the p...A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the project. First, the major factors of rockburst, such as the maximum tangential stress of the cavern wall σθ, uniaxial compressive strength σc, uniaxial tensile strength or, and the elastic energy index of rock Wet, were taken into account in the analysis. Three factors, Stress coefficient σθ/σc, rock brittleness coefficient σc/σt, and elastic energy index Wet, were defined as the criterion indices for rockburst prediction in the proposed model. After training and testing of 12 sets of measured data, the discriminant functions of FDA were solved, and the ratio of misdiscrimina- tion is zero. Moreover, the proposed model was used to predict rockbursts of Qinling tunnel along Xi'an-Ankang railway. The results show that three forecast results are identical with the actual situation. Therefore, the prediction accuracy of the FDA model is acceptable.展开更多
Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulatio...Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.展开更多
In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based o...In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.展开更多
基金Equinor for financing the R&D projectthe Institute of Science and Technology of Petroleum Geophysics of Brazil for supporting this research。
文摘We apply stochastic seismic inversion and Bayesian facies classification for porosity modeling and igneous rock identification in the presalt interval of the Santos Basin. This integration of seismic and well-derived information enhances reservoir characterization. Stochastic inversion and Bayesian classification are powerful tools because they permit addressing the uncertainties in the model. We used the ES-MDA algorithm to achieve the realizations equivalent to the percentiles P10, P50, and P90 of acoustic impedance, a novel method for acoustic inversion in presalt. The facies were divided into five: reservoir 1,reservoir 2, tight carbonates, clayey rocks, and igneous rocks. To deal with the overlaps in acoustic impedance values of facies, we included geological information using a priori probability, indicating that structural highs are reservoir-dominated. To illustrate our approach, we conducted porosity modeling using facies-related rock-physics models for rock-physics inversion in an area with a well drilled in a coquina bank and evaluated the thickness and extension of an igneous intrusion near the carbonate-salt interface. The modeled porosity and the classified seismic facies are in good agreement with the ones observed in the wells. Notably, the coquinas bank presents an improvement in the porosity towards the top. The a priori probability model was crucial for limiting the clayey rocks to the structural lows. In Well B, the hit rate of the igneous rock in the three scenarios is higher than 60%, showing an excellent thickness-prediction capability.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
文摘Social media has revolutionized the dissemination of real-life information,serving as a robust platform for sharing life events.Twitter,characterized by its brevity and continuous flow of posts,has emerged as a crucial source for public health surveillance,offering valuable insights into public reactions during the COVID-19 pandemic.This study aims to leverage a range of machine learning techniques to extract pivotal themes and facilitate text classification on a dataset of COVID-19 outbreak-related tweets.Diverse topic modeling approaches have been employed to extract pertinent themes and subsequently form a dataset for training text classification models.An assessment of coherence metrics revealed that the Gibbs Sampling Dirichlet Mixture Model(GSDMM),which utilizes trigram and bag-of-words(BOW)feature extraction,outperformed Non-negative Matrix Factorization(NMF),Latent Dirichlet Allocation(LDA),and a hybrid strategy involving Bidirectional Encoder Representations from Transformers(BERT)combined with LDA and K-means to pinpoint significant themes within the dataset.Among the models assessed for text clustering,the utilization of LDA,either as a clustering model or for feature extraction combined with BERT for K-means,resulted in higher coherence scores,consistent with human ratings,signifying their efficacy.In particular,LDA,notably in conjunction with trigram representation and BOW,demonstrated superior performance.This underscores the suitability of LDA for conducting topic modeling,given its proficiency in capturing intricate textual relationships.In the context of text classification,models such as Linear Support Vector Classification(LSVC),Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BiLSTM),Convolutional Neural Network with BiLSTM(CNN-BiLSTM),and BERT have shown outstanding performance,achieving accuracy and weighted F1-Score scores exceeding 80%.These results significantly surpassed other models,such as Multinomial Naive Bayes(MNB),Linear Support Vector Machine(LSVM),and Logistic Regression(LR),which achieved scores in the range of 60 to 70 percent.
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
文摘We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.
文摘Net Primary Productivity (NPP) is an important parameter, which is closely connected with global climate change, the global carbon balance and cycle. The study of climate- vegetation interaction is the basis for research on the responses of terrestrial ecosystemto global change and mainly comprises two important components: climate vegetation classification and the NPP of the natural vegetation. Comparing NPP estimated from the classification indices-based model with NPP derived from measurements at 3767 sites in China indicated that the classification indices-based model was capable of estimating large scale NPP. Annual cumulative temperature above 0~C and a moisture index, two main factors affecting NPP, were spatially plotted with the ArcGIS grid tool based on measured data in 2348 meteorological stations from 1961 to 2006. The distribution of NPP for potential vegetation classes under present climate conditions was simulated by the classification indices-based model. The model estimated the total NPP of potential terrestrial vegetation of China to fluctuate between 1.93 and 4.54 Pg C year-1. It pro- vides a reliable means for scaling-up from site to regional scales, and the findings could potentially favor China's position in reducing global warming gases as outlined in the Kyoto Protocol in order to fulfill China's commitment of reducing greenhouse gases.
基金the Deanship of Scientific Research at King Khalid University for funding this work underGrant Number(RGP 2/209/42)PrincessNourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R136)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4210118DSR27).
文摘Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/142/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R151)+1 种基金Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4310373DSR13This research project was supported by a grant from the Research Center of the Female Scientific and Medical Colleges,Deanship of Scientific Research,King Saud University.
文摘Recently,computer aided diagnosis(CAD)model becomes an effective tool for decision making in healthcare sector.The advances in computer vision and artificial intelligence(AI)techniques have resulted in the effective design of CAD models,which enables to detection of the existence of diseases using various imaging modalities.Oral cancer(OC)has commonly occurred in head and neck globally.Earlier identification of OC enables to improve survival rate and reduce mortality rate.Therefore,the design of CAD model for OC detection and classification becomes essential.Therefore,this study introduces a novel Computer Aided Diagnosis for OC using Sailfish Optimization with Fusion based Classification(CADOC-SFOFC)model.The proposed CADOC-SFOFC model determines the existence of OC on the medical images.To accomplish this,a fusion based feature extraction process is carried out by the use of VGGNet-16 and Residual Network(ResNet)model.Besides,feature vectors are fused and passed into the extreme learning machine(ELM)model for classification process.Moreover,SFO algorithm is utilized for effective parameter selection of the ELM model,consequently resulting in enhanced performance.The experimental analysis of the CADOC-SFOFC model was tested on Kaggle dataset and the results reported the betterment of the CADOC-SFOFC model over the compared methods with maximum accuracy of 98.11%.Therefore,the CADOC-SFOFC model has maximum potential as an inexpensive and non-invasive tool which supports screening process and enhances the detection efficiency.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2021R1I1A1A01052299).
文摘Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a rescue.The input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’experience.Therefore,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest x-rays.In medical classifications,deep convolution neural networks are commonly used.This research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and Pneumonia.The MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR images.The evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and f1-score.The model effectively decreases training loss while increasing accuracy.The findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
文摘This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...
基金National Natural Science Foundation of China(No.61171057,No.61503345)Science Foundation for North University of China(No.110246)+1 种基金Specialized Research Fund for Doctoral Program of Higher Education of China(No.20121420110004)International Office of Shanxi Province Education Department of China,and Basic Research Project in Shanxi Province(Young Foundation)
文摘In order to reduce amount of data storage and improve processing capacity of the system, this paper proposes a new classification method of data source by combining phase synchronization model in network clustering with cloud model. Firstly, taking data source as a complex network, after the topography of network is obtained, the cloud model of each node data is determined by fuzzy analytic hierarchy process (AHP). Secondly, by calculating expectation, entropy and hyper entropy of the cloud model, comprehensive coupling strength is got and then it is regarded as the edge weight of topography. Finally, distribution curve is obtained by iterating the phase of each node by means of phase synchronization model. Thus classification of data source is completed. This method can not only provide convenience for storage, cleaning and compression of data, but also improve the efficiency of data analysis.
基金We wish to thank the Ministry of Science an d Technology of China for its finan cial support of the“Project 973”(No.2002CB211705)the Science and Technology Admi nistration of Henan Province.
文摘Coalbed methane has been explored in many basins worldwide for 30 years, and has been developed commercially in some of the basins. Many researchers have described the characteristics of coalbed methane geology and technology systematically. According to these investigations, a coalbed methane reservoir can be defined: 'a coal seam that contains some coalbed methane and is isolated from other fluid units is called a coalbed methane reservoir'. On the basis of anatomization, analysis, and comparison of the typical coalbed methane reservoirs, coalbed methane reservoirs can be divided into two classes: the hydrodynamic sealing coalbed methane reservoirs and the self-sealing coalbed methane reservoirs. The former can be further divided into two sub-classes: the hydrodynamic capping coalbed methane reservoirs, which can be divided into five types and the hydrodynamic driving coalbed methane reservoirs, which can be divided into three types. The latter can be divided into three types. Currently, hydrodynamic sealing reservoirs are the main target for coalbed methane exploration and development; self-sealing reservoirs are unsuitable for coalbed methane exploration and development, but they are closely related with coal mine gas hazards. Finally, a model for hydrodynamic sealing coalbed methane reservoirs is established.
基金supported by the Ministry of Trade,Industry & Energy(MOTIE,Korea) under Industrial Technology Innovation Program (No.10063424,'development of distant speech recognition and multi-task dialog processing technologies for in-door conversational robots')
文摘A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.
文摘The Sanjiang Plain, where nearly 20 kinds of wetlands exist now, is one of the largest wetlands distributed area of wetlands in China. To identify each of them and pick up them separately by means of automatic interpretation of remote sensing from TM Landsat images is extremely important. However, most of the types of wetlands can not be divided each other due to the similarity and the illegibility of the wetland spectrum shown in TM images. Special disposals to remote sensing images include the spectrum enhancement of wetland information, the pseudo color composite of TM images of different bands and the algebra enhancement of TM images. By this way some kinds of wetlands such as Sparganium stoloniferum and Bolboschoenus maritimus can be identified. But in many cases, these methods are still insufficient because of the noise brought from the atmosphere transportation and so on. The physical features of wetlands reflecting the diversification of spectrum information of wetlands, which include the spatial temporal characteristics of the wetlands distribution, the landscape differences of wetlands from season to season, the growing environment and the vertical structure of wetlands vegetation and so on, must be taken into consideration. Besides these, the artificial alteration to spatial structure of wetlands such as the exploitation of some types of them can be also used as important symbols of wetlands identification from remote sensing images. On the basis of the above geographics analysis, a set of wetlands classification models of remote sensing could be established, and many types of wetlands such as paddy field, reed swamp, peat mire, meadow, CAREX marsh and paludification meadow and so on, will be distinguished consequently. All the ways of geographical analysis and model establishment will be given in detail in this article.
基金Supported by the National 11th Five-Year Science and Technology Supporting Plan of China(2006BAB02A02)Central South University Innovation funded projects (2009ssxt230, 2009ssxt234)
文摘A Fisher discriminant analysis (FDA) model for the prediction of classification of rockburst in deep-buried long tunnel was established based on the Fisher discriminant theory and the actual characteristics of the project. First, the major factors of rockburst, such as the maximum tangential stress of the cavern wall σθ, uniaxial compressive strength σc, uniaxial tensile strength or, and the elastic energy index of rock Wet, were taken into account in the analysis. Three factors, Stress coefficient σθ/σc, rock brittleness coefficient σc/σt, and elastic energy index Wet, were defined as the criterion indices for rockburst prediction in the proposed model. After training and testing of 12 sets of measured data, the discriminant functions of FDA were solved, and the ratio of misdiscrimina- tion is zero. Moreover, the proposed model was used to predict rockbursts of Qinling tunnel along Xi'an-Ankang railway. The results show that three forecast results are identical with the actual situation. Therefore, the prediction accuracy of the FDA model is acceptable.
基金National Natural Science Foundation of China(51408174)Provincial Undergraduate Innovation and Entrepreneurship Training Program of Hefei University of Technology(S201910359302)
文摘Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.
基金Supported by Zhejiang Provincial Philosophy and Social Science Planning Zhijiang Youth Project of China(Grant No.16ZJQN017YB)Ministry of Education of China,Humanities and Social Science Projects(Grant No.19YJA910006)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LY20A010019)Fundamental Research Funds for the Provincial Universities of Zhejiang(Grant No.GK199900299012-204)Zhejiang Provincial Statistical Science Research Base Project of China(Grant No.19TJJD08)
文摘In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.