This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulatio...Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.展开更多
Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensit...Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensitivity (-50%), indicating a high false negative rate. We have successfully demonstrated that cancer derived DNA biomarkers can be detected in urine of patients with cancer and can be used for the early detection of cancer (Jain et al., 2015; Lin et al., 2011; Song et al., 2012; Su, Lin, Song, & Jain, 2014; Su, Wang, Norton, Brenner, & Block, 2008). By combining urine biomarkers (uBMK) values and serum AFP (sAFP) level, a new classification model has been proposed for more efficient HCC screening. Several criterions have been discussed to optimal the cutoff for uBMK score and sAFP score. A joint distribution of sAFP and uBMK with point mass has been fitted using maximum likelihood method. Numerical results show that the sAFP data and uBMK data are very well described by proposed model. A tree-structured sequential test can be optimized by selecting the cutoffs. Bootstrap simulations also show the robust classification results with the optimal cuto~..展开更多
A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defi...A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defined on the basis of in-depth analysis of completeness and correctness of pattern databases. Labels of short sequences are predicted by learned RIPPER rule set and the nature of the unidentified short sequences is confirmed by statistical method. Experiment results indicate that the classification model increases clearly the deviation between the attack and the normal traces and improves detection capability against known and unknown attacks.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
The early decay of citrus can cause economic and serious food safety issues.The early decayed area has no obvious visual characteristics,making effective detection of this damage very difficult for the citrus industry...The early decay of citrus can cause economic and serious food safety issues.The early decayed area has no obvious visual characteristics,making effective detection of this damage very difficult for the citrus industry.This study constructed a new detection system based on visible-light emitting diode(LED)structured-illumination imaging and proposed an effective methodology combined with a spiral phase transform(SPT)algorithm for the early detection of decayed oranges.Each sample obtained three phase-shifting pattern images with phase shifts of−2π/3,0,and 2π/3 at a spatial frequency of 0.25 cycles/mm.Three strategies(i.e.,the conventional three-phase-shifting method,2-phase SPT,and 1-phase SPT)were used to demodulate the original patterned images to recover the direct component(DC)and amplitude component(AC)images.The partial least squares discriminant analysis(PLS-DA)and least squares support vector machine(LS-SVM)classification models were established based on the texture features of DC,AC,and RT(i.e.the ratio of AC to DC)images.Then,the random frog(RF)algorithm was used to simplify the optimal full-featured model.Finally,the LS-SVM model constructed using 7 texture features from the RT image obtained an average classification accuracy of 95.1%for all tested samples.This study indicates that the proposed structured-illumination imaging technique combined with 2-phase SPT and feature-based classification model can achieve the fast identification of early decayed oranges.展开更多
The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number i...The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number in millimeter wave system,the multi-task deep residual shrinkage network(MTDRSN) and transfer learning-based convolutional neural network(TCNN), namely MDTCNet, are proposed. The sampling covariance matrix based on the received signal is used as the input to the proposed network. A DRSN-based multi-task classifications model is first introduced to estimate signal sources number and multipath number simultaneously. Then, the DoAs with multi-signal and multipath are estimated by the regression model. The proposed CNN is applied for DoAs estimation with the predicted number of signal sources and paths. Furthermore, the modelbased transfer learning is also introduced into the regression model. The TCNN inherits the partial network parameters of the already formed optimization model obtained by the CNN. A series of experimental results show that the MDTCNet-based DoAs estimation method can accurately predict the signal sources number and multipath number under a range of signal-to-noise ratios. Remarkably, the proposed method achieves the lower root mean square error compared with some existing deep learning-based and traditional methods.展开更多
Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well a...Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.展开更多
Cardiac diseases are one of the greatest global health challenges.Due to the high annual mortality rates,cardiac diseases have attracted the attention of numerous researchers in recent years.This article proposes a hy...Cardiac diseases are one of the greatest global health challenges.Due to the high annual mortality rates,cardiac diseases have attracted the attention of numerous researchers in recent years.This article proposes a hybrid fuzzy fusion classification model for cardiac arrhythmia diseases.The fusion model is utilized to optimally select the highest-ranked features generated by a variety of well-known feature-selection algorithms.An ensemble of classifiers is then applied to the fusion’s results.The proposed model classifies the arrhythmia dataset from the University of California,Irvine into normal/abnormal classes as well as 16 classes of arrhythmia.Initially,at the preprocessing steps,for the miss-valued attributes,we used the average value in the linear attributes group by the same class and the most frequent value for nominal attributes.However,in order to ensure the model optimality,we eliminated all attributes which have zero or constant values that might bias the results of utilized classifiers.The preprocessing step led to 161 out of 279 attributes(features).Thereafter,a fuzzy-based feature-selection fusion method is applied to fuse high-ranked features obtained from different heuristic feature-selection algorithms.In short,our study comprises three main blocks:(1)sensing data and preprocessing;(2)feature queuing,selection,and extraction;and(3)the predictive model.Our proposed method improves classification performance in terms of accuracy,F1measure,recall,and precision when compared to state-of-the-art techniques.It achieves 98.5%accuracy for binary class mode and 98.9%accuracy for categorized class mode.展开更多
In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer visi...In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer vision and Deep Learning(DL)approaches can be commonly employed in designing effective models for rice plant disease detection and classification processes.With this motivation,the current research work devises an Efficient Deep Learning based FusionModel for Rice Plant Disease(EDLFM-RPD)detection and classification.The aim of the proposed EDLFM-RPD technique is to detect and classify different kinds of rice plant diseases in a proficient manner.In addition,EDLFM-RPD technique involves median filtering-based preprocessing and K-means segmentation to determine the infected portions.The study also used a fusion of handcrafted Gray Level Co-occurrence Matrix(GLCM)and Inception-based deep features to derive the features.Finally,Salp Swarm Optimization with Fuzzy Support Vector Machine(FSVM)model is utilized for classification.In order to validate the enhanced outcomes of EDLFM-RPD technique,a series of simulations was conducted.The results were assessed under different measures.The obtained values infer the improved performance of EDLFM-RPD technique over recent approaches and achieved a maximum accuracy of 96.170%.展开更多
In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Ve...In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.展开更多
Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w...Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.展开更多
Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct ...Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.展开更多
Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous deb...Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.展开更多
Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products h...Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products has been difficult.To predict the lifetime specification of pneumatic cylinders with high reliability and long lifetime and small specimen,this paper put forward the prognosis algorithm based on the path classification and estimation(PACE) model.PACE model is based entirely on failure data instead of failure threshold.Pneumatic cylinders normally characterize with failure mechanism wear and tear.Since the minimum working pressure increases with the number of working cycles,the minimum working pressure is chosen as degradation signal.PACE model is fundamentally composed of two operations:path classification and remaining useful life(RUL) estimation.Path classification is to classify a current degradation path as belonging to one or more of previously collected exemplary degradation paths.RUL estimation is to use the resulting memberships to estimate the remaining useful life.In order for verification and validation of PACE prognostic method,six pneumatic cylinders are tested.The test data is analyzed by PACE prognostics.It is found that the PACE based prognosis method has higher prediction accuracy and smaller variance and PACE model is significantly outperform population based prognostics especially for small specimen condition.PACE model based method solved the problem of prediction accuracy for small specimen pneumatic cylinders' prognosis.展开更多
A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-...A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macrobloeks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.展开更多
The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantizati...The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG rate control.展开更多
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
基金National Natural Science Foundation of China(51408174)Provincial Undergraduate Innovation and Entrepreneurship Training Program of Hefei University of Technology(S201910359302)
文摘Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.
文摘Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensitivity (-50%), indicating a high false negative rate. We have successfully demonstrated that cancer derived DNA biomarkers can be detected in urine of patients with cancer and can be used for the early detection of cancer (Jain et al., 2015; Lin et al., 2011; Song et al., 2012; Su, Lin, Song, & Jain, 2014; Su, Wang, Norton, Brenner, & Block, 2008). By combining urine biomarkers (uBMK) values and serum AFP (sAFP) level, a new classification model has been proposed for more efficient HCC screening. Several criterions have been discussed to optimal the cutoff for uBMK score and sAFP score. A joint distribution of sAFP and uBMK with point mass has been fitted using maximum likelihood method. Numerical results show that the sAFP data and uBMK data are very well described by proposed model. A tree-structured sequential test can be optimized by selecting the cutoffs. Bootstrap simulations also show the robust classification results with the optimal cuto~..
文摘A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defined on the basis of in-depth analysis of completeness and correctness of pattern databases. Labels of short sequences are predicted by learned RIPPER rule set and the nature of the unidentified short sequences is confirmed by statistical method. Experiment results indicate that the classification model increases clearly the deviation between the attack and the normal traces and improves detection capability against known and unknown attacks.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金supported by the Outstanding Scientist Cultivation Project of Beijing Academy of Agriculture and Forestry Sciences(Grant No.JKZX202405)Jiangsu Province and Education Ministry Co-sponsored Synergistic Innovation Center of Modern Agricultural Equipment(Grant No.XTCX2001)+2 种基金National Natural Science Foundation of China(Grant No.31972152No.32260622)Natural Science Foundation of Jiangxi Province,China(Grant No.20232ACB205026).
文摘The early decay of citrus can cause economic and serious food safety issues.The early decayed area has no obvious visual characteristics,making effective detection of this damage very difficult for the citrus industry.This study constructed a new detection system based on visible-light emitting diode(LED)structured-illumination imaging and proposed an effective methodology combined with a spiral phase transform(SPT)algorithm for the early detection of decayed oranges.Each sample obtained three phase-shifting pattern images with phase shifts of−2π/3,0,and 2π/3 at a spatial frequency of 0.25 cycles/mm.Three strategies(i.e.,the conventional three-phase-shifting method,2-phase SPT,and 1-phase SPT)were used to demodulate the original patterned images to recover the direct component(DC)and amplitude component(AC)images.The partial least squares discriminant analysis(PLS-DA)and least squares support vector machine(LS-SVM)classification models were established based on the texture features of DC,AC,and RT(i.e.the ratio of AC to DC)images.Then,the random frog(RF)algorithm was used to simplify the optimal full-featured model.Finally,the LS-SVM model constructed using 7 texture features from the RT image obtained an average classification accuracy of 95.1%for all tested samples.This study indicates that the proposed structured-illumination imaging technique combined with 2-phase SPT and feature-based classification model can achieve the fast identification of early decayed oranges.
基金funded by Beijing University of Posts and Telecommunications-China Mobile Research Institute Joint Innovation Center。
文摘The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number in millimeter wave system,the multi-task deep residual shrinkage network(MTDRSN) and transfer learning-based convolutional neural network(TCNN), namely MDTCNet, are proposed. The sampling covariance matrix based on the received signal is used as the input to the proposed network. A DRSN-based multi-task classifications model is first introduced to estimate signal sources number and multipath number simultaneously. Then, the DoAs with multi-signal and multipath are estimated by the regression model. The proposed CNN is applied for DoAs estimation with the predicted number of signal sources and paths. Furthermore, the modelbased transfer learning is also introduced into the regression model. The TCNN inherits the partial network parameters of the already formed optimization model obtained by the CNN. A series of experimental results show that the MDTCNet-based DoAs estimation method can accurately predict the signal sources number and multipath number under a range of signal-to-noise ratios. Remarkably, the proposed method achieves the lower root mean square error compared with some existing deep learning-based and traditional methods.
文摘Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.
文摘Cardiac diseases are one of the greatest global health challenges.Due to the high annual mortality rates,cardiac diseases have attracted the attention of numerous researchers in recent years.This article proposes a hybrid fuzzy fusion classification model for cardiac arrhythmia diseases.The fusion model is utilized to optimally select the highest-ranked features generated by a variety of well-known feature-selection algorithms.An ensemble of classifiers is then applied to the fusion’s results.The proposed model classifies the arrhythmia dataset from the University of California,Irvine into normal/abnormal classes as well as 16 classes of arrhythmia.Initially,at the preprocessing steps,for the miss-valued attributes,we used the average value in the linear attributes group by the same class and the most frequent value for nominal attributes.However,in order to ensure the model optimality,we eliminated all attributes which have zero or constant values that might bias the results of utilized classifiers.The preprocessing step led to 161 out of 279 attributes(features).Thereafter,a fuzzy-based feature-selection fusion method is applied to fuse high-ranked features obtained from different heuristic feature-selection algorithms.In short,our study comprises three main blocks:(1)sensing data and preprocessing;(2)feature queuing,selection,and extraction;and(3)the predictive model.Our proposed method improves classification performance in terms of accuracy,F1measure,recall,and precision when compared to state-of-the-art techniques.It achieves 98.5%accuracy for binary class mode and 98.9%accuracy for categorized class mode.
文摘In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer vision and Deep Learning(DL)approaches can be commonly employed in designing effective models for rice plant disease detection and classification processes.With this motivation,the current research work devises an Efficient Deep Learning based FusionModel for Rice Plant Disease(EDLFM-RPD)detection and classification.The aim of the proposed EDLFM-RPD technique is to detect and classify different kinds of rice plant diseases in a proficient manner.In addition,EDLFM-RPD technique involves median filtering-based preprocessing and K-means segmentation to determine the infected portions.The study also used a fusion of handcrafted Gray Level Co-occurrence Matrix(GLCM)and Inception-based deep features to derive the features.Finally,Salp Swarm Optimization with Fuzzy Support Vector Machine(FSVM)model is utilized for classification.In order to validate the enhanced outcomes of EDLFM-RPD technique,a series of simulations was conducted.The results were assessed under different measures.The obtained values infer the improved performance of EDLFM-RPD technique over recent approaches and achieved a maximum accuracy of 96.170%.
文摘In this paper, we explore the multi-classification problem of acupuncture acupoints bas</span><span><span style="font-family:Verdana;">ed on </span><span style="font-family:Verdana;">Bert</span><span style="font-family:Verdana;"> model, </span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, we try to recommend the best main acupuncture point for treating the disease by classifying and predicting the main acupuncture point for the disease, and further explore its acupuncture point grouping to provide the medical practitioner with the optimal solution for treating the disease and improv</span></span></span><span style="font-family:Verdana;">ing</span><span style="font-family:""><span style="font-family:Verdana;"> the clinical decision-making ability. The Bert-Chinese-Acupoint model was constructed by retraining </span><span style="font-family:Verdana;">on the basis of</span><span style="font-family:Verdana;"> the Bert model, and the semantic features in terms of acupuncture points were added to the acupunctu</span></span><span style="font-family:""><span style="font-family:Verdana;">re point corpus in the fine-tuning process to increase the semantic features in terms of acupuncture </span><span style="font-family:Verdana;">points,</span><span style="font-family:Verdana;"> and compared with the machine learning method. The results show that the Bert-Chinese Acupoint model proposed in this paper has a 3% improvement in accuracy compared to the </span><span style="font-family:Verdana;">best performing</span><span style="font-family:Verdana;"> model in the machine learning approach.
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1067008)by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.
文摘Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.
基金jointly funded by the National Natural Science Foundation of China(grants No.41172104,41202078 and 41372117)the Major National S&T Program of China(grant No.2011ZX05009-002)
文摘Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.
基金supported by the Laboratory of Aviation Safety Technical Analysis and Appraisal of China Academy of Civil Aviation Science and Technology(Grant No. 2009-02)
文摘Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products has been difficult.To predict the lifetime specification of pneumatic cylinders with high reliability and long lifetime and small specimen,this paper put forward the prognosis algorithm based on the path classification and estimation(PACE) model.PACE model is based entirely on failure data instead of failure threshold.Pneumatic cylinders normally characterize with failure mechanism wear and tear.Since the minimum working pressure increases with the number of working cycles,the minimum working pressure is chosen as degradation signal.PACE model is fundamentally composed of two operations:path classification and remaining useful life(RUL) estimation.Path classification is to classify a current degradation path as belonging to one or more of previously collected exemplary degradation paths.RUL estimation is to use the resulting memberships to estimate the remaining useful life.In order for verification and validation of PACE prognostic method,six pneumatic cylinders are tested.The test data is analyzed by PACE prognostics.It is found that the PACE based prognosis method has higher prediction accuracy and smaller variance and PACE model is significantly outperform population based prognostics especially for small specimen condition.PACE model based method solved the problem of prediction accuracy for small specimen pneumatic cylinders' prognosis.
基金This project was supported by the High Technology Research and Development Programof China (2002AA103087) .
文摘A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macrobloeks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.
基金The High Technology Research and Devel-opm ent Program of China( No. 2 0 0 2 AA10 3 0 87)
文摘The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG rate control.