Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for pr...In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.展开更多
Bailongjiang watershed in southern Gansu province, China, is one of the most landslide-prone regions in China, characterized by very high frequency of landslide occurrence. In order to predict the landslide occurrence...Bailongjiang watershed in southern Gansu province, China, is one of the most landslide-prone regions in China, characterized by very high frequency of landslide occurrence. In order to predict the landslide occurrence, a comprehensive map of landslide susceptibility is required which may be significantly helpful in reducing loss of property and human life. In this study, an integrated model of information value method and logistic regression is proposed by using their merits at maximum and overcoming their weaknesses, which may enhance precision and accuracy of landslide susceptibility assessment. A detailed and reliable landslide inventory with 1587 landslides was prepared and randomly divided into two groups,(i) training dataset and(ii) testing dataset. Eight distinct landslide conditioning factors including lithology, slope gradient, aspect, elevation, distance to drainages,distance to faults, distance to roads and vegetation coverage were selected for landslide susceptibility mapping. The produced landslide susceptibility maps were validated by the success rate and prediction rate curves. The validation results show that the success rate and the prediction rate of the integrated model are 81.7 % and 84.6 %, respectively, which indicate that the proposed integrated method is reliable to produce an accurate landslide susceptibility map and the results may be used for landslides management and mitigation.展开更多
Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ri...Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ridge height and potential energy during wave-ridge interaction with a binary and cumulative logistic regression model. In testing the Global Null Hypothesis, all values are p 〈0.001, with three statistical methods, such as Likelihood Ratio, Score, and Wald. While comparing with two kinds of models, tests values obtained by cumulative logistic regression models are better than those by binary logistic regression models. Although this study employed cumulative logistic regression model, three probability functions p^1, p^2 and p^3, are utilized for investigating the weighted influence of factors on wave reflection. Deviance and Pearson tests are applied to cheek the goodness-of-fit of the proposed model. The analytical results demonstrated that both ridge height (X1 ) and potential energy (X2 ) significantly impact (p 〈 0. 0001 ) the amplitude-based refleeted rate; the P-values for the deviance and Pearson are all 〉 0.05 (0.2839, 0.3438, respectively). That is, the goodness-of-fit between ridge height ( X1 ) and potential energy (X2) can further predict parameters under the scenario of the best parsimonious model. Investigation of 6 predictive powers ( R2, Max-rescaled R^2, Sorners' D, Gamma, Tau-a, and c, respectively) indicate that these predictive estimates of the proposed model have better predictive ability than ridge height alone, and are very similar to the interaction of ridge height and potential energy. It can be concluded that the goodness-of-fit and prediction ability of the cumulative logistic regression model are better than that of the binary logistic regression model.展开更多
The currently prevalent machine performance degradation assessment techniques involve estimating a machine's current condition based upon the recognition of indications of failure features,which entail complete data ...The currently prevalent machine performance degradation assessment techniques involve estimating a machine's current condition based upon the recognition of indications of failure features,which entail complete data collected in different conditions.However,failure data are always hard to acquire,thus making those techniques hard to be applied.In this paper,a novel method which does not need failure history data is introduced.Wavelet packet decomposition(WPD) is used to extract features from raw signals,principal component analysis(PCA) is utilized to reduce feature dimensions,and Gaussian mixture model(GMM) is then applied to approximate the feature space distributions.Single-channel confidence value(SCV) is calculated by the overlap between GMM of the monitoring condition and that of the normal condition,which can indicate the performance of single-channel.Furthermore,multi-channel confidence value(MCV),which can be deemed as the overall performance index of multi-channel,is calculated via logistic regression(LR) and that the task of decision-level sensor fusion is also completed.Both SCV and MCV can serve as the basis on which proactive maintenance measures can be taken,thus preventing machine breakdown.The method has been adopted to assess the performance of the turbine of a centrifugal compressor in a factory of Petro-China,and the result shows that it can effectively complete this task.The proposed method has engineering significance for machine performance degradation assessment.展开更多
This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on m...This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.展开更多
Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for model...Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for modeling the probabilities of geological hazard occurrences, upon which hierarchical warnings for rainfall-induced geological hazards are produced. The forecasting and warning model takes numerical precipitation forecasts on grid points as its dynamic input, forecasts the probabilities of geological hazard occurrences on the same grid, and translates the results into likelihoods in the form of a 5-level hierarchy. Validation of the model with observational data for the year 2004 shows that 80% of the geological hazards of the year have been identified as "likely enough to release warning messages". The model can satisfy the requirements of an operational warning system, thus is an effective way to improve the meteorological warnings for geological hazards.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application...Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.展开更多
The logistic regression model has been become commonly used to study the association between a binary response variable;it is widespread application rests on its easy application and interpretation. The subject of ass...The logistic regression model has been become commonly used to study the association between a binary response variable;it is widespread application rests on its easy application and interpretation. The subject of assessment of goodness-of-fit in logistic regression model has attracted the attention of many scientists and researchers. Goodness-of-fit tests are methods to determine the suitability of the fitted model. Many of methods proposed and discussed for assessing goodness-of fit in logistic regression model, however, the asymptotic distribution of goodness-of-fit statistics are less examine, it is need more investigated. This work, will focus on assessing the behavior of asymptotic distribution of goodness-of-fit tests, also make comparison between global goodness-of-fit tests, and evaluate it by simulation.展开更多
Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are h...Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.展开更多
Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other a...Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other approaches, such as the discriminant analysis which requires these assumptions to be met. Moreover, it produces sound estimates by changing the probability range between 0.0 and 1.0 to log odds ranging from negative infinity to positive infinity, as it applies transformation of the dependent variable to a continuous variable. The estimates are asymptotically consistent with the requirements of the nonlinear regression process. The results of MNL can be interpreted by both the regression coefficient estimates and/or the odd ratios (the exponentiated coefficients) as well. In addition, the MNL can be used to improve the fitted model by comparing the full model that includes all predictors to a chosen restricted model by excluding the non-significant predictors. As such, this paper presents a detailed step by step overview of incorporating the MNL in crash severity modeling, using vehicle crash data of the Interstate I70 in the State of Missouri, USA for the years (2013-2015).展开更多
Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Ban...Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
文摘In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.
基金supported by the Project of the 12th Five-year National Sci-Tech Support Plan of China(2011BAK12B09)China Special Project of Basic Work of Science and Technology(2011FY110100-2)
文摘Bailongjiang watershed in southern Gansu province, China, is one of the most landslide-prone regions in China, characterized by very high frequency of landslide occurrence. In order to predict the landslide occurrence, a comprehensive map of landslide susceptibility is required which may be significantly helpful in reducing loss of property and human life. In this study, an integrated model of information value method and logistic regression is proposed by using their merits at maximum and overcoming their weaknesses, which may enhance precision and accuracy of landslide susceptibility assessment. A detailed and reliable landslide inventory with 1587 landslides was prepared and randomly divided into two groups,(i) training dataset and(ii) testing dataset. Eight distinct landslide conditioning factors including lithology, slope gradient, aspect, elevation, distance to drainages,distance to faults, distance to roads and vegetation coverage were selected for landslide susceptibility mapping. The produced landslide susceptibility maps were validated by the success rate and prediction rate curves. The validation results show that the success rate and the prediction rate of the integrated model are 81.7 % and 84.6 %, respectively, which indicate that the proposed integrated method is reliable to produce an accurate landslide susceptibility map and the results may be used for landslides management and mitigation.
基金This paper was financially supported by NSC96-2628-E-366-004-MY2 and NSC96-2628-E-132-001-MY2
文摘Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ridge height and potential energy during wave-ridge interaction with a binary and cumulative logistic regression model. In testing the Global Null Hypothesis, all values are p 〈0.001, with three statistical methods, such as Likelihood Ratio, Score, and Wald. While comparing with two kinds of models, tests values obtained by cumulative logistic regression models are better than those by binary logistic regression models. Although this study employed cumulative logistic regression model, three probability functions p^1, p^2 and p^3, are utilized for investigating the weighted influence of factors on wave reflection. Deviance and Pearson tests are applied to cheek the goodness-of-fit of the proposed model. The analytical results demonstrated that both ridge height (X1 ) and potential energy (X2 ) significantly impact (p 〈 0. 0001 ) the amplitude-based refleeted rate; the P-values for the deviance and Pearson are all 〉 0.05 (0.2839, 0.3438, respectively). That is, the goodness-of-fit between ridge height ( X1 ) and potential energy (X2) can further predict parameters under the scenario of the best parsimonious model. Investigation of 6 predictive powers ( R2, Max-rescaled R^2, Sorners' D, Gamma, Tau-a, and c, respectively) indicate that these predictive estimates of the proposed model have better predictive ability than ridge height alone, and are very similar to the interaction of ridge height and potential energy. It can be concluded that the goodness-of-fit and prediction ability of the cumulative logistic regression model are better than that of the binary logistic regression model.
基金supported by National Key Natural Science Foundation of China (Grant No. 50635010)
文摘The currently prevalent machine performance degradation assessment techniques involve estimating a machine's current condition based upon the recognition of indications of failure features,which entail complete data collected in different conditions.However,failure data are always hard to acquire,thus making those techniques hard to be applied.In this paper,a novel method which does not need failure history data is introduced.Wavelet packet decomposition(WPD) is used to extract features from raw signals,principal component analysis(PCA) is utilized to reduce feature dimensions,and Gaussian mixture model(GMM) is then applied to approximate the feature space distributions.Single-channel confidence value(SCV) is calculated by the overlap between GMM of the monitoring condition and that of the normal condition,which can indicate the performance of single-channel.Furthermore,multi-channel confidence value(MCV),which can be deemed as the overall performance index of multi-channel,is calculated via logistic regression(LR) and that the task of decision-level sensor fusion is also completed.Both SCV and MCV can serve as the basis on which proactive maintenance measures can be taken,thus preventing machine breakdown.The method has been adopted to assess the performance of the turbine of a centrifugal compressor in a factory of Petro-China,and the result shows that it can effectively complete this task.The proposed method has engineering significance for machine performance degradation assessment.
文摘This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.
基金the New Technology Generalization Project of China Meteorological Administration (CMATG2004M05)
文摘Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for modeling the probabilities of geological hazard occurrences, upon which hierarchical warnings for rainfall-induced geological hazards are produced. The forecasting and warning model takes numerical precipitation forecasts on grid points as its dynamic input, forecasts the probabilities of geological hazard occurrences on the same grid, and translates the results into likelihoods in the form of a 5-level hierarchy. Validation of the model with observational data for the year 2004 shows that 80% of the geological hazards of the year have been identified as "likely enough to release warning messages". The model can satisfy the requirements of an operational warning system, thus is an effective way to improve the meteorological warnings for geological hazards.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
文摘Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.
文摘The logistic regression model has been become commonly used to study the association between a binary response variable;it is widespread application rests on its easy application and interpretation. The subject of assessment of goodness-of-fit in logistic regression model has attracted the attention of many scientists and researchers. Goodness-of-fit tests are methods to determine the suitability of the fitted model. Many of methods proposed and discussed for assessing goodness-of fit in logistic regression model, however, the asymptotic distribution of goodness-of-fit statistics are less examine, it is need more investigated. This work, will focus on assessing the behavior of asymptotic distribution of goodness-of-fit tests, also make comparison between global goodness-of-fit tests, and evaluate it by simulation.
文摘Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.
文摘Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other approaches, such as the discriminant analysis which requires these assumptions to be met. Moreover, it produces sound estimates by changing the probability range between 0.0 and 1.0 to log odds ranging from negative infinity to positive infinity, as it applies transformation of the dependent variable to a continuous variable. The estimates are asymptotically consistent with the requirements of the nonlinear regression process. The results of MNL can be interpreted by both the regression coefficient estimates and/or the odd ratios (the exponentiated coefficients) as well. In addition, the MNL can be used to improve the fitted model by comparing the full model that includes all predictors to a chosen restricted model by excluding the non-significant predictors. As such, this paper presents a detailed step by step overview of incorporating the MNL in crash severity modeling, using vehicle crash data of the Interstate I70 in the State of Missouri, USA for the years (2013-2015).
文摘Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.