期刊文献+
共找到93篇文章
< 1 2 5 >
每页显示 20 50 100
Incorporating empirical knowledge into data-driven variable selection for quantitative analysis of coal ash content by laser-induced breakdown spectroscopy
1
作者 吕一涵 宋惟然 +1 位作者 侯宗余 王哲 《Plasma Science and Technology》 SCIE EI CAS CSCD 2024年第7期148-156,共9页
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a... Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification. 展开更多
关键词 laser-induced breakdown spectroscopy(LIBS) coal ash content quantitative analysis variable selection empirical knowledge partial least squares regression(PLSR)
下载PDF
Exploration of the Impact Mechanism of Government Credibility Based on Variable Screening Method
2
作者 Jiajun Wu Yuxiang Ma +2 位作者 Helin Zou Chun Zhang Ran Yan 《Journal of Data Analysis and Information Processing》 2024年第3期479-494,共16页
Government credibility is an important asset of contemporary national governance, an important criterion for evaluating government legitimacy, and a key factor in measuring the effectiveness of government governance. ... Government credibility is an important asset of contemporary national governance, an important criterion for evaluating government legitimacy, and a key factor in measuring the effectiveness of government governance. In recent years, researchers’ research on government credibility has mostly focused on exploring theories and mechanisms, with little empirical research on this topic. This article intends to apply variable selection models in the field of social statistics to the issue of government credibility, in order to achieve empirical research on government credibility and explore its core influencing factors from a statistical perspective. Specifically, this article intends to use four regression-analysis-based methods and three random-forest-based methods to study the influencing factors of government credibility in various provinces in China, and compare the performance of these seven variable selection methods in different dimensions. The research results show that there are certain differences in simplicity, accuracy, and variable importance ranking among different variable selection methods, which present different importance in the study of government credibility issues. This study provides a methodological reference for variable selection models in the field of social science research, and also offers a multidimensional comparative perspective for analyzing the influencing factors of government credibility. 展开更多
关键词 Government Credibility variable Selection Models Social Statistics Regression Based Approach Method Based on Random Forest
下载PDF
A Two-Step Algorithm to Estimate Variable Importance for Multi-State Data:An Application to COVID-19
3
作者 Behnaz Alafchi Leili Tapak +2 位作者 Hassan Doosti Christophe Chesneau Ghodratollah Roshanaei 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第6期2047-2064,共18页
Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences... Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance. 展开更多
关键词 Multi-state data deviance residual martingale residual gradient boosting randomforest neural network variable importance variable selection
下载PDF
An Adaptive Sequential Replacement Method for Variable Selection in Linear Regression Analysis
4
作者 Jixiang Wu Johnie N. Jenkins Jack C. McCarty Jr. 《Open Journal of Statistics》 2023年第5期746-760,共15页
With the rapid development of DNA technologies, high throughput genomic data have become a powerful leverage to locate desirable genetic loci associated with traits of importance in various crop species. However, curr... With the rapid development of DNA technologies, high throughput genomic data have become a powerful leverage to locate desirable genetic loci associated with traits of importance in various crop species. However, current genetic association mapping analyses are focused on identifying individual QTLs. This study aimed to identify a set of QTLs or genetic markers, which can capture genetic variability for marker-assisted selection. Selecting a set with k loci that can maximize genetic variation out of high throughput genomic data is a challenging issue. In this study, we proposed an adaptive sequential replacement (ASR) method, which is considered a variant of the sequential replacement (SR) method. Through Monte Carlo simulation and comparing with four other selection methods: exhaustive, SR method, forward, and backward methods we found that the ASR method sustains consistent and repeatable results comparable to the exhaustive method with much reduced computational intensity. 展开更多
关键词 Adaptive Sequential Replacement Association Mapping Exhaustive Method Global Optimal Solution Sequential Replacement variable Selection
下载PDF
Nonparametric Feature Screening via the Variance of the Regression Function
5
作者 Won Chul Song Michael G. Akritas 《Open Journal of Statistics》 2024年第4期413-438,共26页
This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their res... This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings. 展开更多
关键词 Sure Independence Screening Nonparametric Regression Ultrahigh-Dimensional Data variable Selection
下载PDF
Quantitative analysis of the content of nitrogen and sulfur in coal based on laserinduced breakdown spectroscopy: effects of variable selection 被引量:5
6
作者 Fan DENG Yu DING +2 位作者 Yujuan CHEN Shaonong ZHU Feifan CHEN 《Plasma Science and Technology》 SCIE EI CAS CSCD 2020年第7期36-43,共8页
Coal is a crucial fossil energy in today’s society,and the detection of sulfir(S) and nitrogen(N)in coal is essential for the evaluation of coal quality.Therefore,an efficient method is needed to quantitatively analy... Coal is a crucial fossil energy in today’s society,and the detection of sulfir(S) and nitrogen(N)in coal is essential for the evaluation of coal quality.Therefore,an efficient method is needed to quantitatively analyze N and S content in coal,to achieve the purpose of clean utilization of coal.This study applied laser-induced breakdown spectroscopy(LIBS) to test coal quality,and combined two variable selection algorithms,competitive adaptive reweighted sampling(CARS) and the successive projections algorithm(SPA),to establish the corresponding partial least square(PLS) model.The results of the experiment were as follows.The PLS modeled with the full spectrum of 27,620 variables has poor accuracy,the coefficient of determination of the test set(R^2 P) and root mean square error of the test set(RMSEP) of nitrogen were 0.5172 and 0.2263,respectively,and those of sulfur were0.5784 and 0.5811,respectively.The CARS-PLS screened 37 and 25 variables respectively in the detection of N and S elements,but the prediction ability of the model did not improve significantly.SPA-PLS finally screened 14 and 11 variables respectively through successive projections,and obtained the best prediction effect among the three methods.The R^2 P and RMSEP of nitrogen were0.9873 and 0.0208,respectively,and those of sulfur were 0.9451 and 0.2082,respectively.In general,the predictive results of the two elements increased by about 90% for RMSEP and 60% for R2 P compared with PLS.The results show that LIBS combined with SPA-PLS has good potential for detecting N and S content in coal,and is a very promising technology for industrial application. 展开更多
关键词 variable selection LIBS COAL CARS and SPA
下载PDF
Accelerated Recursive Feature Elimination Based on Support Vector Machine for Key Variable Identification 被引量:4
7
作者 毛勇 皮道映 +1 位作者 刘育明 孙优贤 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2006年第1期65-72,共8页
Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i... Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application. 展开更多
关键词 variable selection support vector machine recursive feature elimination fault diagnosis
下载PDF
Estimation of leaf color variances of Cotinus coggygria based on geographic and environmental variables 被引量:3
8
作者 Xing Tan Jiaojiao Wu +3 位作者 Yun Liu Shixia Huang Lan Gao Wen Zhang 《Journal of Forestry Research》 SCIE CAS CSCD 2021年第2期609-622,共14页
Capturing leaf color variances over space is important for diagnosing plant nutrient and health status,estimating water availability as well as improving ornamental and tourism values of plants.In this study,leaf colo... Capturing leaf color variances over space is important for diagnosing plant nutrient and health status,estimating water availability as well as improving ornamental and tourism values of plants.In this study,leaf color variances of the Eurasian smoke tree,Cotinus coggygria were estimated based on geographic and climate variables in a shrub community using generalized elastic net(GELnet)and support vector machine(SVM)algorithms.Results reveal that leaf color varied over space,and the variances were the result of geography due to its effect on solar radiation,temperature,illumination and moisture of the shrub environment,whereas the influence of climate were not obvious.The SVM and GELnet algorithm models were similar estimating leaf color indices based on geographic variables,and demonstrates that both techniques have the potential to estimate leaf color variances of C.coggygria in a shrubbery with a complex geographical environment in the absence of human activity. 展开更多
关键词 RGB color space Digital elevation model variable selection SVM algorithm GELnet algorithm
下载PDF
Variable Selection of Partially Linear Single-index Models 被引量:1
9
作者 L U Yi-qiang HU Bin 《Chinese Quarterly Journal of Mathematics》 CSCD 2014年第3期392-399,共8页
In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average varianc... In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM. 展开更多
关键词 variable selection adaptive LASSO minimized average variance estimation(MAVE) partially linear single-index model
下载PDF
Spectroscopic Multicomponent Analysis Using Multi-objective Optimization for Variable Selection 被引量:1
10
作者 Anderson da Silva Soares Telma Woerle de Lima +3 位作者 Daniel Vitor de LuPcena Rogerio Lopes Salvini GustavoTeodoro Laureano Clarimar Jose Coelho 《Computer Technology and Application》 2013年第9期466-475,共10页
The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. The... The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. These variables are obtained by a spectrophotometer device. This device measures hundreds of correlated variables related with physicocbemical properties and that can be used to estimate the component of interest. The problem is the selection of a subset of informative and uncorrelated variables that help the minimization of prediction error. Classical algorithms select a subset of variables for each compound considered. In this work we propose the use of the SPEA-II (strength Pareto evolutionary algorithm II). We would like to show that the variable selection algorithm can selected just one subset used for multiple determinations using multiple linear regressions. For the case study is used wheat data obtained by NIR (near-infrared spectroscopy) spectrometry where the objective is the determination of a variable subgroup with information about E protein content (%), test weight (Kg/HI), WKT (wheat kernel texture) (%) and farinograph water absorption (%). The results of traditional techniques of multivariate calibration as the SPA (successive projections algorithm), PLS (partial least square) and mono-objective genetic algorithm are presents for comparisons. For NIR spectral analysis of protein concentration on wheat, the number of variables selected from 775 spectral variables was reduced for just 10 in the SPEA-II algorithm. The prediction error decreased from 0.2 in the classical methods to 0.09 in proposed approach, a reduction of 37%. The model using variables selected by SPEA-II had better prediction performance than classical algorithms and full-spectrum partial least-squares. 展开更多
关键词 Multi-objective algorithms variable selection linear regression.
下载PDF
Bayesian Variable Selection for Mixture Process Variable Design Experiment 被引量:1
11
作者 Sadiah M. A. Aljeddani 《Open Journal of Modelling and Simulation》 2022年第4期391-416,共26页
This paper discussed Bayesian variable selection methods for models from split-plot mixture designs using samples from Metropolis-Hastings within the Gibbs sampling algorithm. Bayesian variable selection is easy to im... This paper discussed Bayesian variable selection methods for models from split-plot mixture designs using samples from Metropolis-Hastings within the Gibbs sampling algorithm. Bayesian variable selection is easy to implement due to the improvement in computing via MCMC sampling. We described the Bayesian methodology by introducing the Bayesian framework, and explaining Markov Chain Monte Carlo (MCMC) sampling. The Metropolis-Hastings within Gibbs sampling was used to draw dependent samples from the full conditional distributions which were explained. In mixture experiments with process variables, the response depends not only on the proportions of the mixture components but also on the effects of the process variables. In many such mixture-process variable experiments, constraints such as time or cost prohibit the selection of treatments completely at random. In these situations, restrictions on the randomisation force the level combinations of one group of factors to be fixed and the combinations of the other group of factors are run. Then a new level of the first-factor group is set and combinations of the other factors are run. We discussed the computational algorithm for the Stochastic Search Variable Selection (SSVS) in linear mixed models. We extended the computational algorithm of SSVS to fit models from split-plot mixture design by introducing the algorithm of the Stochastic Search Variable Selection for Split-plot Design (SSVS-SPD). The motivation of this extension is that we have two different levels of the experimental units, one for the whole plots and the other for subplots in the split-plot mixture design. 展开更多
关键词 variable Selection Bayesian Analysis Mixture Experiment Split-Plot Design
下载PDF
Variable Selection in Randomized Block Design Experiment 被引量:1
12
作者 Sadiah Mohammed Aljeddani 《American Journal of Computational Mathematics》 2022年第2期216-231,共16页
In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accu... In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accuracy as noninformative variables will be removed. Having a model with high prediction accuracy allows the researchers to use the model for future forecasting. In this paper, we investigate the differences between various variable selection methods. The aim is to compare the analysis of the frequentist methodology (the backward elimination), penalised shrinkage method (the Adaptive LASSO) and the Least Angle Regression (LARS) for selecting the active variables for data produced by the blocked design experiment. The result of the comparative study supports the utilization of the LARS method for statistical analysis of data from blocked experiments. 展开更多
关键词 variable Selection Shrinkage Methods Linear Mixed Model Blocked Designs
下载PDF
Variable selection for skew-normal mixture of joint location and scale models
13
作者 WU Liu-cang YANG Song-qin TAO Ye 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2021年第4期475-491,共17页
Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of ... Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of the variance parameter.In this paper,we propose and study a novel class of models:a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population.The problem of variable selection for the proposed models is considered.In particular,a modi ed Expectation-Maximization(EM)algorithm for estimating the model parameters is developed.The consistency and the oracle property of the penalized estimators is established.Simulation studies are conducted to investigate the nite sample performance of the proposed methodolo-gies.An example is illustrated by the proposed methodologies. 展开更多
关键词 heterogeneous population skew-normal(SN)distribution mixture of joint location and scale models variable selection EM algorithm
下载PDF
Online quantitative analysis of soluble solids content in navel oranges using visible-nearinfrared spectroscopy and variable selection methods
14
作者 Yande Liu Yanrui Zhou Yuanyuan Pan 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2014年第6期1-8,共8页
Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis o... Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis of soluble solids content(SSC)in navel oranges.Moving window partial least squares(MW-PLS),Monte Carlo uninformative variables elimination(MC-UVE)and wavelet transform(WT)combined with the MC-UVE method were used to select the spectral variables and develop the calibration models of online analysis of SSC in navel oranges.The performances of these methods were compared for modeling the Vis NIR data sets of navel orange samples.Results show that the WT-MC-UVE methods gave better calibration models with the higher correlation cofficient(r)of 0.89 and lower root mean square error of prediction(RMSEP)of 0.54 at 5 fruits per second.It concluded that Vis NIR spectroscopy coupled with WT-MC-UVE may be a fast and efective tool for online quantitative analysis of SSC in navel oranges. 展开更多
关键词 Vis NIR spectroscopy variables selection soluble solids content wavelet transform moving window paurtial least squares Monte Carlo uninformative variables elimination
下载PDF
Comparative Study of Variable Selection Using Genetic Algorithm with Various Types of Chromosomes
15
作者 陈国华 陆瑶 夏之宁 《Chinese Journal of Structural Chemistry》 SCIE CAS CSCD 2010年第9期1431-1437,共7页
In this study,different methods of variable selection using the multilinear step-wise regression(MLR) and support vector regression(SVR) have been compared when the performance of genetic algorithms(GAs) using v... In this study,different methods of variable selection using the multilinear step-wise regression(MLR) and support vector regression(SVR) have been compared when the performance of genetic algorithms(GAs) using various types of chromosomes is used.The first method is a GA with binary chromosome(GA-BC) and the other is a GA with a fixed-length character chromosome(GA-FCC).The overall prediction accuracy for the training set by means of 7-fold cross-validation was tested.All the regression models were evaluated by the test set.The poor prediction for the test set illustrates that the forward stepwise regression(FSR) model is easier to overfit for the training set.The results using SVR methods showed that the over-fitting could be overcome.Further,the over-fitting would be easier for the GA-BC-SVR method because too many variables fleetly induced into the model.The final optimal model was obtained with good predictive ability(R2 = 0.885,S = 0.469,Rcv2 = 0.700,Scv = 0.757,Rex2 = 0.692,Sex = 0.675) using GA-FCC-SVR method.Our investigation indicates the variable selection method using GA-FCC is the most appropriate for MLR and SVR methods. 展开更多
关键词 support vector regression genetic algorithm variable selection quantitative structure activity relationship multiple linear regression
下载PDF
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
16
作者 王清河 周勇 《Acta Mathematica Scientia》 SCIE CSCD 2006年第3期469-476,共8页
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variabl... A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent. 展开更多
关键词 Heteroscedastic regression models variable selection WAVELETS
下载PDF
Fuzzy identification of nonlinear dynamic system based on selection of important input variables
17
作者 LYU Jinfeng LIU Fucai REN Yaxue 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第3期737-747,共11页
Input variables selection(IVS) is proved to be pivotal in nonlinear dynamic system modeling. In order to optimize the model of the nonlinear dynamic system, a fuzzy modeling method for determining the premise structur... Input variables selection(IVS) is proved to be pivotal in nonlinear dynamic system modeling. In order to optimize the model of the nonlinear dynamic system, a fuzzy modeling method for determining the premise structure by selecting important inputs of the system is studied. Firstly, a simplified two stage fuzzy curves method is proposed, which is employed to sort all possible inputs by their relevance with outputs, select the important input variables of the system and identify the structure.Secondly, in order to reduce the complexity of the model, the standard fuzzy c-means clustering algorithm and the recursive least squares algorithm are used to identify the premise parameters and conclusion parameters, respectively. Then, the effectiveness of IVS is verified by two well-known issues. Finally, the proposed identification method is applied to a realistic variable load pneumatic system. The simulation experiments indi cate that the IVS method in this paper has a positive influence on the approximation performance of the Takagi-Sugeno(T-S) fuzzy modeling. 展开更多
关键词 Takagi-Sugeno(T-S)fuzzy modeling input variable selection(IVS) fuzzy identification fuzzy c-means clustering algorithm
下载PDF
Ensemble Variable Selection for Naive Bayes to Improve Customer Behaviour Analysis
18
作者 R.Siva Subramanian D.Prabha 《Computer Systems Science & Engineering》 SCIE EI 2022年第4期339-355,共17页
Executing customer analysis in a systemic way is one of the possible solutions for each enterprise to understand the behavior of consumer patterns in an efficient and in-depth manner.Further investigation of customer p... Executing customer analysis in a systemic way is one of the possible solutions for each enterprise to understand the behavior of consumer patterns in an efficient and in-depth manner.Further investigation of customer patterns helps thefirm to develop efficient decisions and in turn,helps to optimize the enter-prise’s business and maximizes consumer satisfaction correspondingly.To con-duct an effective assessment about the customers,Naive Bayes(also called Simple Bayes),a machine learning model is utilized.However,the efficacious of the simple Bayes model is utterly relying on the consumer data used,and the existence of uncertain and redundant attributes in the consumer data enables the simple Bayes model to attain the worst prediction in consumer data because of its presumption regarding the attributes applied.However,in practice,the NB pre-mise is not true in consumer data,and the analysis of these redundant attributes enables simple Bayes model to get poor prediction results.In this work,an ensem-ble attribute selection methodology is performed to overcome the problem with consumer data and to pick a steady uncorrelated attribute set to model with the NB classifier.In ensemble variable selection,two different strategies are applied:one is based upon data perturbation(or homogeneous ensemble,same feature selector is applied to a different subsamples derived from the same learning set)and the other one is based upon function perturbation(or heterogeneous ensemble different feature selector is utilized to the same learning set).Further-more,the feature set captured from both ensemble strategies is applied to NB indi-vidually and the outcome obtained is computed.Finally,the experimental outcomes show that the proposed ensemble strategies perform efficiently in choosing a steady attribute set and increasing NB classification performance efficiently. 展开更多
关键词 Naive bayes or simple bayes variable selection homogeneous ensemble heterogeneous ensemble customer prediction
下载PDF
Clustering of the Values of a Response Variable and Simultaneous Covariate Selection Using a Stepwise Algorithm
19
作者 Olivier Collignon Jean-Marie Monnez 《Applied Mathematics》 2016年第15期1639-1648,共10页
In supervised learning the number of values of a response variable can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand sele... In supervised learning the number of values of a response variable can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models. We propose in this paper an algorithm that simultaneously groups the values of a response variable into a limited number of clusters and selects stepwise the best covariates that discriminate this clustering. These objectives are achieved by alternate optimization of a user-defined model selection criterion. This process extends a former version of the algorithm to a more general framework. Moreover possible further developments are discussed in detail. 展开更多
关键词 Classification variable Selection Supervised Learning Akaike Information Criterion Wilks’ Lambda
下载PDF
A Comparison of Variable Selection by Tabu Search and Stepwise Regression with Multicollinearity Problem
20
作者 Kannat Na Bangchang 《Journal of Statistical Science and Application》 2015年第1期16-24,共9页
This paper has compared variable selection method for multiple linear regression models that have both relative and non-relative variables in full model when predictor variables are highly correlated 0.999 . In this s... This paper has compared variable selection method for multiple linear regression models that have both relative and non-relative variables in full model when predictor variables are highly correlated 0.999 . In this study two objective functions used in the Tabu Search are mean square error (MSE) and the mean absolute error (MAE). The results of Tabu Search are compared with the results obtained by stepwise regression method based on the hit percentage criterion. The simulations cover the both cases, without and with multicollinearity problems. For each situation, 1,000 iterations are examined by applying a different sample size n = 25 and 100 at 0.05 level of significance. Without multicollinearity problem, the hit percentages of the stepwise regression method and Tabu Search using the objective function of MSE are almost the same but slightly higher than the Tabu Search using the objective function of MAE. However with multicollinearity problem the hit percentages of the Tabu Search using both objective functions are higher than the hit percentage of the stepwise regression method. 展开更多
关键词 Stepwise regression Tabu Search variable selection
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部