Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensit...Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensitivity (-50%), indicating a high false negative rate. We have successfully demonstrated that cancer derived DNA biomarkers can be detected in urine of patients with cancer and can be used for the early detection of cancer (Jain et al., 2015; Lin et al., 2011; Song et al., 2012; Su, Lin, Song, & Jain, 2014; Su, Wang, Norton, Brenner, & Block, 2008). By combining urine biomarkers (uBMK) values and serum AFP (sAFP) level, a new classification model has been proposed for more efficient HCC screening. Several criterions have been discussed to optimal the cutoff for uBMK score and sAFP score. A joint distribution of sAFP and uBMK with point mass has been fitted using maximum likelihood method. Numerical results show that the sAFP data and uBMK data are very well described by proposed model. A tree-structured sequential test can be optimized by selecting the cutoffs. Bootstrap simulations also show the robust classification results with the optimal cuto~..展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number i...The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number in millimeter wave system,the multi-task deep residual shrinkage network(MTDRSN) and transfer learning-based convolutional neural network(TCNN), namely MDTCNet, are proposed. The sampling covariance matrix based on the received signal is used as the input to the proposed network. A DRSN-based multi-task classifications model is first introduced to estimate signal sources number and multipath number simultaneously. Then, the DoAs with multi-signal and multipath are estimated by the regression model. The proposed CNN is applied for DoAs estimation with the predicted number of signal sources and paths. Furthermore, the modelbased transfer learning is also introduced into the regression model. The TCNN inherits the partial network parameters of the already formed optimization model obtained by the CNN. A series of experimental results show that the MDTCNet-based DoAs estimation method can accurately predict the signal sources number and multipath number under a range of signal-to-noise ratios. Remarkably, the proposed method achieves the lower root mean square error compared with some existing deep learning-based and traditional methods.展开更多
Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w...Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.展开更多
This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct ...Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim...Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.展开更多
Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers....Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.展开更多
Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well a...Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.展开更多
Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulatio...Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.展开更多
Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous deb...Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.展开更多
Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products h...Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products has been difficult.To predict the lifetime specification of pneumatic cylinders with high reliability and long lifetime and small specimen,this paper put forward the prognosis algorithm based on the path classification and estimation(PACE) model.PACE model is based entirely on failure data instead of failure threshold.Pneumatic cylinders normally characterize with failure mechanism wear and tear.Since the minimum working pressure increases with the number of working cycles,the minimum working pressure is chosen as degradation signal.PACE model is fundamentally composed of two operations:path classification and remaining useful life(RUL) estimation.Path classification is to classify a current degradation path as belonging to one or more of previously collected exemplary degradation paths.RUL estimation is to use the resulting memberships to estimate the remaining useful life.In order for verification and validation of PACE prognostic method,six pneumatic cylinders are tested.The test data is analyzed by PACE prognostics.It is found that the PACE based prognosis method has higher prediction accuracy and smaller variance and PACE model is significantly outperform population based prognostics especially for small specimen condition.PACE model based method solved the problem of prediction accuracy for small specimen pneumatic cylinders' prognosis.展开更多
A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defi...A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defined on the basis of in-depth analysis of completeness and correctness of pattern databases. Labels of short sequences are predicted by learned RIPPER rule set and the nature of the unidentified short sequences is confirmed by statistical method. Experiment results indicate that the classification model increases clearly the deviation between the attack and the normal traces and improves detection capability against known and unknown attacks.展开更多
In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer visi...In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer vision and Deep Learning(DL)approaches can be commonly employed in designing effective models for rice plant disease detection and classification processes.With this motivation,the current research work devises an Efficient Deep Learning based FusionModel for Rice Plant Disease(EDLFM-RPD)detection and classification.The aim of the proposed EDLFM-RPD technique is to detect and classify different kinds of rice plant diseases in a proficient manner.In addition,EDLFM-RPD technique involves median filtering-based preprocessing and K-means segmentation to determine the infected portions.The study also used a fusion of handcrafted Gray Level Co-occurrence Matrix(GLCM)and Inception-based deep features to derive the features.Finally,Salp Swarm Optimization with Fuzzy Support Vector Machine(FSVM)model is utilized for classification.In order to validate the enhanced outcomes of EDLFM-RPD technique,a series of simulations was conducted.The results were assessed under different measures.The obtained values infer the improved performance of EDLFM-RPD technique over recent approaches and achieved a maximum accuracy of 96.170%.展开更多
Extreme Learning Machine(ELM)is popular in batch learning,sequential learning,and progressive learning,due to its speed,easy integration,and generalization ability.While,Traditional ELM cannot train massive data rapid...Extreme Learning Machine(ELM)is popular in batch learning,sequential learning,and progressive learning,due to its speed,easy integration,and generalization ability.While,Traditional ELM cannot train massive data rapidly and efficiently due to its memory residence,high time and space complexity.In ELM,the hidden layer typically necessitates a huge number of nodes.Furthermore,there is no certainty that the arrangement of weights and biases within the hidden layer is optimal.To solve this problem,the traditional ELM has been hybridized with swarm intelligence optimization techniques.This paper displays five proposed hybrid Algorithms“Salp Swarm Algorithm(SSA-ELM),Grasshopper Algorithm(GOA-ELM),Grey Wolf Algorithm(GWO-ELM),Whale optimizationAlgorithm(WOA-ELM)andMoth Flame Optimization(MFO-ELM)”.These five optimizers are hybridized with standard ELM methodology for resolving the tumor type classification using gene expression data.The proposed models applied to the predication of electricity loading data,that describes the energy use of a single residence over a fouryear period.In the hidden layer,Swarm algorithms are used to pick a smaller number of nodes to speed up the execution of ELM.The best weights and preferences were calculated by these algorithms for the hidden layer.Experimental results demonstrated that the proposed MFO-ELM achieved 98.13%accuracy and this is the highest model in accuracy in tumor type classification gene expression data.While in predication,the proposed GOA-ELM achieved 0.397which is least RMSE compared to the other models.展开更多
The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantizati...The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG rate control.展开更多
A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-...A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macrobloeks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.展开更多
文摘Early detection of hepatocellular carcinoma (HCC) is critical for the effective treatment. Alpha fetoprotein (AFP) serum level is currently used for HCC screening, but the cutoff of the AFP test has limited sensitivity (-50%), indicating a high false negative rate. We have successfully demonstrated that cancer derived DNA biomarkers can be detected in urine of patients with cancer and can be used for the early detection of cancer (Jain et al., 2015; Lin et al., 2011; Song et al., 2012; Su, Lin, Song, & Jain, 2014; Su, Wang, Norton, Brenner, & Block, 2008). By combining urine biomarkers (uBMK) values and serum AFP (sAFP) level, a new classification model has been proposed for more efficient HCC screening. Several criterions have been discussed to optimal the cutoff for uBMK score and sAFP score. A joint distribution of sAFP and uBMK with point mass has been fitted using maximum likelihood method. Numerical results show that the sAFP data and uBMK data are very well described by proposed model. A tree-structured sequential test can be optimized by selecting the cutoffs. Bootstrap simulations also show the robust classification results with the optimal cuto~..
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金funded by Beijing University of Posts and Telecommunications-China Mobile Research Institute Joint Innovation Center。
文摘The direction-of-arrival(DoA) estimation is one of the hot research areas in signal processing. To overcome the DoA estimation challenge without the prior information about signal sources number and multipath number in millimeter wave system,the multi-task deep residual shrinkage network(MTDRSN) and transfer learning-based convolutional neural network(TCNN), namely MDTCNet, are proposed. The sampling covariance matrix based on the received signal is used as the input to the proposed network. A DRSN-based multi-task classifications model is first introduced to estimate signal sources number and multipath number simultaneously. Then, the DoAs with multi-signal and multipath are estimated by the regression model. The proposed CNN is applied for DoAs estimation with the predicted number of signal sources and paths. Furthermore, the modelbased transfer learning is also introduced into the regression model. The TCNN inherits the partial network parameters of the already formed optimization model obtained by the CNN. A series of experimental results show that the MDTCNet-based DoAs estimation method can accurately predict the signal sources number and multipath number under a range of signal-to-noise ratios. Remarkably, the proposed method achieves the lower root mean square error compared with some existing deep learning-based and traditional methods.
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1067008)by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
文摘Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
基金funded by National Natural Science Foundation of China under Grant No.61806171The Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115,The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.
文摘Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.
文摘Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.
基金National Natural Science Foundation of China(51408174)Provincial Undergraduate Innovation and Entrepreneurship Training Program of Hefei University of Technology(S201910359302)
文摘Accurate simulation of tropical cyclone tracks is a prerequisite for tropical cyclone risk assessment.Against the spatial characteristics of tropical cyclone tracks in the Northwest Pacific region,stochastic simulation method based on classification model is used to simulate tropical cyclone tracks in this region.Such simulation includes the classification method,the genesis model,the traveling model,and the lysis model.Tropical cyclone tracks in the Northwest Pacific region are classified into five categories on the basis of its movement characteristics and steering positions.In the genesis model,Gaussian kernel probability density functions with the biased cross validation method are used to simulate the annual occurrence number and genesis positions.The traveling model is established on the basis of the mean and mean square error of the historical 6 h latitude and longitude displacements.The termination probability is used as the discrimination standard in the lysis model.Then,this stochastic simulation method of tropical cyclone tracks is applied and qualitatively evaluated with different diagnostics.Results show that the tropical cyclone tracks in Northwest Pacific can be satisfactorily simulated with this classification model.
基金jointly funded by the National Natural Science Foundation of China(grants No.41172104,41202078 and 41372117)the Major National S&T Program of China(grant No.2011ZX05009-002)
文摘Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.
基金supported by the Laboratory of Aviation Safety Technical Analysis and Appraisal of China Academy of Civil Aviation Science and Technology(Grant No. 2009-02)
文摘Prognosis is a key technology to improve reliability,safety and maintainability of products,a lot of researchers have been devoted to this technology.But to improve the predict accuracy of remaining life of products has been difficult.To predict the lifetime specification of pneumatic cylinders with high reliability and long lifetime and small specimen,this paper put forward the prognosis algorithm based on the path classification and estimation(PACE) model.PACE model is based entirely on failure data instead of failure threshold.Pneumatic cylinders normally characterize with failure mechanism wear and tear.Since the minimum working pressure increases with the number of working cycles,the minimum working pressure is chosen as degradation signal.PACE model is fundamentally composed of two operations:path classification and remaining useful life(RUL) estimation.Path classification is to classify a current degradation path as belonging to one or more of previously collected exemplary degradation paths.RUL estimation is to use the resulting memberships to estimate the remaining useful life.In order for verification and validation of PACE prognostic method,six pneumatic cylinders are tested.The test data is analyzed by PACE prognostics.It is found that the PACE based prognosis method has higher prediction accuracy and smaller variance and PACE model is significantly outperform population based prognostics especially for small specimen condition.PACE model based method solved the problem of prediction accuracy for small specimen pneumatic cylinders' prognosis.
文摘A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defined on the basis of in-depth analysis of completeness and correctness of pattern databases. Labels of short sequences are predicted by learned RIPPER rule set and the nature of the unidentified short sequences is confirmed by statistical method. Experiment results indicate that the classification model increases clearly the deviation between the attack and the normal traces and improves detection capability against known and unknown attacks.
文摘In agriculture,rice plant disease diagnosis has become a challenging issue,and early identification of this disease can avoid huge loss incurred from less crop productivity.Some of the recently-developed computer vision and Deep Learning(DL)approaches can be commonly employed in designing effective models for rice plant disease detection and classification processes.With this motivation,the current research work devises an Efficient Deep Learning based FusionModel for Rice Plant Disease(EDLFM-RPD)detection and classification.The aim of the proposed EDLFM-RPD technique is to detect and classify different kinds of rice plant diseases in a proficient manner.In addition,EDLFM-RPD technique involves median filtering-based preprocessing and K-means segmentation to determine the infected portions.The study also used a fusion of handcrafted Gray Level Co-occurrence Matrix(GLCM)and Inception-based deep features to derive the features.Finally,Salp Swarm Optimization with Fuzzy Support Vector Machine(FSVM)model is utilized for classification.In order to validate the enhanced outcomes of EDLFM-RPD technique,a series of simulations was conducted.The results were assessed under different measures.The obtained values infer the improved performance of EDLFM-RPD technique over recent approaches and achieved a maximum accuracy of 96.170%.
文摘Extreme Learning Machine(ELM)is popular in batch learning,sequential learning,and progressive learning,due to its speed,easy integration,and generalization ability.While,Traditional ELM cannot train massive data rapidly and efficiently due to its memory residence,high time and space complexity.In ELM,the hidden layer typically necessitates a huge number of nodes.Furthermore,there is no certainty that the arrangement of weights and biases within the hidden layer is optimal.To solve this problem,the traditional ELM has been hybridized with swarm intelligence optimization techniques.This paper displays five proposed hybrid Algorithms“Salp Swarm Algorithm(SSA-ELM),Grasshopper Algorithm(GOA-ELM),Grey Wolf Algorithm(GWO-ELM),Whale optimizationAlgorithm(WOA-ELM)andMoth Flame Optimization(MFO-ELM)”.These five optimizers are hybridized with standard ELM methodology for resolving the tumor type classification using gene expression data.The proposed models applied to the predication of electricity loading data,that describes the energy use of a single residence over a fouryear period.In the hidden layer,Swarm algorithms are used to pick a smaller number of nodes to speed up the execution of ELM.The best weights and preferences were calculated by these algorithms for the hidden layer.Experimental results demonstrated that the proposed MFO-ELM achieved 98.13%accuracy and this is the highest model in accuracy in tumor type classification gene expression data.While in predication,the proposed GOA-ELM achieved 0.397which is least RMSE compared to the other models.
基金The High Technology Research and Devel-opm ent Program of China( No. 2 0 0 2 AA10 3 0 87)
文摘The paper proposed a new MPEG-2 rate control method that is based on model classification. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG rate control.
基金This project was supported by the High Technology Research and Development Programof China (2002AA103087) .
文摘A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model. The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps, such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macrobloeks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.