Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate ...The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection ...Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.展开更多
Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for deliveri...Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.展开更多
This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. ...This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.展开更多
In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for pr...In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic ...This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic content,compaction of soil(soil toughness),plant root strength,crop biomass,tree diameter at knee height,Shannon Wiener Index(SWI)for trees and herbs was assembled from field tests at two historic landslide locations:Aranayaka and Kurukudegama,Sri Lanka.An economical,finer resolution database was obtained as the field tests were not cost-prohibitive.The logistic regression(LR)analysis showed that soil moisture content,compaction of soil,SWI for trees and herbs were statistically significant at P<0.05.The variance inflation factors(VIFs)were computed to test for multicollinearity.VIF values(<2)confirmed the absence of multicollinearity between four independent variables in the LR model.Receiver Operating Characteristics(ROC)curve and Confusion Metrix(CM)methods were used to validate the model.In ROC analysis,areas under the curve of Success Rate Curve and Prediction Rate Curve were 84.5% and 96.6%,respectively,demonstrating the model’s excellent compatibility and predictability.According to the CM,the model demonstrated a 79.6% accuracy,63.6% precision,100% recall,and a F-measure of 77.8%.The model coefficients revealed that the vegetation cover has a more significant contribution to landslide susceptibility than soil characteristics.Finally,the susceptibility map,which was then classified as low,medium,and highly susceptible areas based on the natural breaks(Jenks)method,was generated using geographical information systems(GIS)techniques.All the historic landslide locations fell into the high susceptibility areas.Thus,validation of the model and inspection of the susceptibility map indicated that the in-situ soil and vegetation characteristics used in the model could be employed to demarcate historical landslide patches and identify landslide susceptible locations with high confidence.展开更多
Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative import...Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative importance of anthropogenic-natural factors and how they interact to induce the ecological land evolution.This research sought to fill this gap.In this study,18 factors,including the risk of goaf collapse,fault,prime croplands,were selected from six aspects of topography,geology,climate,accessibility,socio-economic and land control policies.logistic regression(LR)and random forest(RF)models were adopted to identify the anthropogenic and biophysical factors on the dynamic change of ecological land of Mentougou in Beijing from 1990 to 2018.The results show that there was a significant increase in ecological land from 1990 to 2018.The increased area of ecological land reached 102.11 km2 with an increased rate of 0.78,the gravity center of ecological land gradually moved to the northwest.The impact of anthropogenic factors on ecological land was greater than that of natural factors,ecological land was mainly driven by proportion of prime cropland,per capita GDP,land urbanization,temperature,per capita rural income,elevation and aspect factors.Additionally,slope and precipitation were also identified as important predictors for ecological land change.The model comparison suggested that RF can better identify the relationship between ecological land and explanatory variables than LR model.Based on our findings,the implementation of government policies along with anthropogenic factors are the most important variables influencing ecological land change,and the rational planning and allocation of ecological land by Mentougou government are still needed.展开更多
Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of...Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of death and damage to property. To better understand the landslide condition in the Nepal Himalaya, we carried out an investigation on the landslide distribution and susceptibility using the landslide inventory data and 12 different contributing factors in the Dailekh district, Western Nepal. Based on the evaluation of the frequency distribution of the landslide, the relationship between the landslide and the various contributing factors was determined.Then, the landslide susceptibility was calculated using logistic regression and statistical index methods along with different topographic(slope, aspect, relative relief, plan curvature, altitude, topographic wetness index) and non-topographic factors(distance from river, normalized difference vegetation index(NDVI), distance from road, precipitation, land use and land cover, and geology), and 470(70%) of total 658 landslides. The receiver operating characteristic(ROC) curve analysis using 198(30%) of total landslides showed that the prediction curve rates(area under the curve, AUC) values for two methods(logistic regression and statistical index) were 0.826, and 0.823with success rates of 0.793, and 0.811, respectively. The values of R-Index for the logistic regression and statistical index methods were83.66 and 88.54, respectively, consisting of high susceptible hazard classes. In general, this research concluded that the cohesive and coherent natural interplay of topographic and non-topographic factors strongly affects landslide occurrence, distribution, and susceptibility condition in the Nepal Himalaya region. Furthermore, the reliability of these two methods is verified for landslide susceptibility mapping in Nepal’s central mountain region.展开更多
For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different...For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data.We propose two types of weighted Lasso estimates,depending upon covariates determined by the Mc Diarmid inequality.Given sample size n and a dimension of covariates p,the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the?1-estimation error and the squared prediction error of the unknown parameters.We compare the performance of our method with that of former weighted estimates on simulated data,then apply it to do real data analysis.展开更多
BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the ...BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the classical logistic regression(LR)for this postoperative outcome.AIM To identify the risk factors of AKI after deceased-donor liver transplantation(DDLT)and compare the prediction performance of ANN with that of LR for this complication.METHODS Adult patients with no evidence of end-stage kidney dysfunction(KD)who underwent the first DDLT according to model for end-stage liver disease(MELD)score allocation system was evaluated.AKI was defined according to the International Club of Ascites criteria,and potential predictors of postoperative AKI were identified by LR.The prediction performance of both ANN and LR was tested.RESULTS The incidence of AKI was 60.6%(n=88/145)and the following predictors were identified by LR:MELD score>25(odds ratio[OR]=1.999),preoperative kidney dysfunction(OR=1.279),extended criteria donors(OR=1.191),intraoperative arterial hypotension(OR=1.935),intraoperative massive blood transfusion(MBT)(OR=1.830),and postoperative serum lactate(SL)(OR=2.001).The area under the receiver-operating characteristic curve was best for ANN(0.81,95%confidence interval[CI]:0.75-0.83)than for LR(0.71,95%CI:0.67-0.76).The root-mean-square error and mean absolute error in the ANN model were 0.47 and 0.38,respectively.CONCLUSION The severity of liver disease,pre-existing kidney dysfunction,marginal grafts,hemodynamic instability,MBT,and SL are predictors of postoperative AKI,and ANN has better prediction performance than LR in this scenario.展开更多
Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is foc...Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is focused on the largest afforested landform unit of Romania and, consequently, the most affected area by forest losses: Carpathian Mountains. The main goal of the paper is to examine and analyze the various explanatory variables associated with deforestation process and to model the probability of deforestation using GIS spatial analysis and logistic regression. The forest cover for 1990 and 2012, derived from CORINE Land Cover(CLC) database, were used to quantify historical forest cover change included in the modelling. To explain the biophysical and anthropogenic effects, this study considered several explanatory factors related to local topography, forest cover pattern, accessibility, urban growth and population density. Using ROC(Receiver Operating Characteristic) and 500 controlling sampling points, the statistical and spatial validations were assessed in order to evaluate the performance of the resulted data. The analysis showed that the area experienced a continuous forest cover change, leading to the loss of over 250,000 ha of forested area during the period 1990–2012. The most significant influence of the explanatory factors of deforestation were noticed in case of distance to forest edge(β=–4.215), forest fragmentation(β=2.231), slope declivity(β=–1.901), elevation(β=1.734) and distance to roads(β=–1.713). The statistical and spatial validation indicates a good accuracy of the model with reasonably AUC(0.736) and Kappa(0.739) values. The model's results suggest an intensification of the deforestation process in the area, designing numerous new clusters with high probability in the Apuseni Mountains, northern and central part of the Eastern Carpathians, western part of the Southern Carpathians and northern part of the Banat Mountains. The study could represent a useful outcome to identify the forests more vulnerable to logging and to adopt appropriate policies and decisions in forest management and conservation. In addition, the resulted probability map could be used in other studies in order to investigate potential environmental implications(e.g. geomorphological hazards or impact on biodiversity and landscape diversity).展开更多
Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application...Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.展开更多
Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classifi...Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.展开更多
Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey...Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.展开更多
Objective This study was undertaken to investigate the influencing factors on serum ALT level and hepatitis C virus(HCV)RNA titer in chronic hepatitis C(CHC)patients.Methods All patients enrolled into this study were ...Objective This study was undertaken to investigate the influencing factors on serum ALT level and hepatitis C virus(HCV)RNA titer in chronic hepatitis C(CHC)patients.Methods All patients enrolled into this study were anti-HCV positive.Retrospective tracing method was applied to detect serum ALT level and HCV RNA titer and to collect general information of the patients such as genders,age groups,interferon medication history,infection pathways,height and weight.Then the multi-factor analysis was adopted with the application of binominal logistic regression mode.Results The abnormal rate of ALT level was positively correlated to HCV RNA and gender while negatively correlated to interferon medication history and age group,with Wald value of the 4 factors as 39.604,11.823,18.991 and 7.389,respectively.The positive rate of HCV RNA was negatively correlated to interferon medication history and gender while positively correlated to ALT level,with corresponding Wald value of the 3 factors as81.394,7.618 and 27.562,respectively.Conclusions The normal ALT level in HCV infected patients was associated with viral load,age,gender and interferon medication history,while the normal rate of HCV RNA titer was closely associated with gender,interferon medication history and ALT level.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
基金Under the auspices of National Natural Science Foundation of China(No.42101414)Natural Science Found for Outstanding Young Scholars in Jilin Province(No.20230508106RC)。
文摘The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number(IF2-PSAU-2022/01/22043)。
文摘Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.
文摘Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.
文摘This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.
文摘In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
基金funded by the National Research Council,Sri Lanka[NRC 17-066]。
文摘This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic content,compaction of soil(soil toughness),plant root strength,crop biomass,tree diameter at knee height,Shannon Wiener Index(SWI)for trees and herbs was assembled from field tests at two historic landslide locations:Aranayaka and Kurukudegama,Sri Lanka.An economical,finer resolution database was obtained as the field tests were not cost-prohibitive.The logistic regression(LR)analysis showed that soil moisture content,compaction of soil,SWI for trees and herbs were statistically significant at P<0.05.The variance inflation factors(VIFs)were computed to test for multicollinearity.VIF values(<2)confirmed the absence of multicollinearity between four independent variables in the LR model.Receiver Operating Characteristics(ROC)curve and Confusion Metrix(CM)methods were used to validate the model.In ROC analysis,areas under the curve of Success Rate Curve and Prediction Rate Curve were 84.5% and 96.6%,respectively,demonstrating the model’s excellent compatibility and predictability.According to the CM,the model demonstrated a 79.6% accuracy,63.6% precision,100% recall,and a F-measure of 77.8%.The model coefficients revealed that the vegetation cover has a more significant contribution to landslide susceptibility than soil characteristics.Finally,the susceptibility map,which was then classified as low,medium,and highly susceptible areas based on the natural breaks(Jenks)method,was generated using geographical information systems(GIS)techniques.All the historic landslide locations fell into the high susceptibility areas.Thus,validation of the model and inspection of the susceptibility map indicated that the in-situ soil and vegetation characteristics used in the model could be employed to demarcate historical landslide patches and identify landslide susceptible locations with high confidence.
基金funded by the National Natural Science Foundation of China(Grant No.41877533)。
文摘Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative importance of anthropogenic-natural factors and how they interact to induce the ecological land evolution.This research sought to fill this gap.In this study,18 factors,including the risk of goaf collapse,fault,prime croplands,were selected from six aspects of topography,geology,climate,accessibility,socio-economic and land control policies.logistic regression(LR)and random forest(RF)models were adopted to identify the anthropogenic and biophysical factors on the dynamic change of ecological land of Mentougou in Beijing from 1990 to 2018.The results show that there was a significant increase in ecological land from 1990 to 2018.The increased area of ecological land reached 102.11 km2 with an increased rate of 0.78,the gravity center of ecological land gradually moved to the northwest.The impact of anthropogenic factors on ecological land was greater than that of natural factors,ecological land was mainly driven by proportion of prime cropland,per capita GDP,land urbanization,temperature,per capita rural income,elevation and aspect factors.Additionally,slope and precipitation were also identified as important predictors for ecological land change.The model comparison suggested that RF can better identify the relationship between ecological land and explanatory variables than LR model.Based on our findings,the implementation of government policies along with anthropogenic factors are the most important variables influencing ecological land change,and the rational planning and allocation of ecological land by Mentougou government are still needed.
基金Under the auspices of the CAS Overseas Institutions Platform Project (No. 131C11KYSB20200033)the National Natural Science Foundation of China (No. 42071349)the Sichuan Science and Technology Program (No. 2020JDJQ0003)。
文摘Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of death and damage to property. To better understand the landslide condition in the Nepal Himalaya, we carried out an investigation on the landslide distribution and susceptibility using the landslide inventory data and 12 different contributing factors in the Dailekh district, Western Nepal. Based on the evaluation of the frequency distribution of the landslide, the relationship between the landslide and the various contributing factors was determined.Then, the landslide susceptibility was calculated using logistic regression and statistical index methods along with different topographic(slope, aspect, relative relief, plan curvature, altitude, topographic wetness index) and non-topographic factors(distance from river, normalized difference vegetation index(NDVI), distance from road, precipitation, land use and land cover, and geology), and 470(70%) of total 658 landslides. The receiver operating characteristic(ROC) curve analysis using 198(30%) of total landslides showed that the prediction curve rates(area under the curve, AUC) values for two methods(logistic regression and statistical index) were 0.826, and 0.823with success rates of 0.793, and 0.811, respectively. The values of R-Index for the logistic regression and statistical index methods were83.66 and 88.54, respectively, consisting of high susceptible hazard classes. In general, this research concluded that the cohesive and coherent natural interplay of topographic and non-topographic factors strongly affects landslide occurrence, distribution, and susceptibility condition in the Nepal Himalaya region. Furthermore, the reliability of these two methods is verified for landslide susceptibility mapping in Nepal’s central mountain region.
基金Supported by the National Natural Science Foundation of China(61877023)the Fundamental Research Funds for the Central Universities(CCNU19TD009)。
文摘For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data.We propose two types of weighted Lasso estimates,depending upon covariates determined by the Mc Diarmid inequality.Given sample size n and a dimension of covariates p,the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the?1-estimation error and the squared prediction error of the unknown parameters.We compare the performance of our method with that of former weighted estimates on simulated data,then apply it to do real data analysis.
文摘BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the classical logistic regression(LR)for this postoperative outcome.AIM To identify the risk factors of AKI after deceased-donor liver transplantation(DDLT)and compare the prediction performance of ANN with that of LR for this complication.METHODS Adult patients with no evidence of end-stage kidney dysfunction(KD)who underwent the first DDLT according to model for end-stage liver disease(MELD)score allocation system was evaluated.AKI was defined according to the International Club of Ascites criteria,and potential predictors of postoperative AKI were identified by LR.The prediction performance of both ANN and LR was tested.RESULTS The incidence of AKI was 60.6%(n=88/145)and the following predictors were identified by LR:MELD score>25(odds ratio[OR]=1.999),preoperative kidney dysfunction(OR=1.279),extended criteria donors(OR=1.191),intraoperative arterial hypotension(OR=1.935),intraoperative massive blood transfusion(MBT)(OR=1.830),and postoperative serum lactate(SL)(OR=2.001).The area under the receiver-operating characteristic curve was best for ANN(0.81,95%confidence interval[CI]:0.75-0.83)than for LR(0.71,95%CI:0.67-0.76).The root-mean-square error and mean absolute error in the ANN model were 0.47 and 0.38,respectively.CONCLUSION The severity of liver disease,pre-existing kidney dysfunction,marginal grafts,hemodynamic instability,MBT,and SL are predictors of postoperative AKI,and ANN has better prediction performance than LR in this scenario.
基金elaborated in the framework of the research project framed into the research plan of the Institute of Geography, Romanian Academy:"The National Geographic Atlas of Romania"
文摘Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is focused on the largest afforested landform unit of Romania and, consequently, the most affected area by forest losses: Carpathian Mountains. The main goal of the paper is to examine and analyze the various explanatory variables associated with deforestation process and to model the probability of deforestation using GIS spatial analysis and logistic regression. The forest cover for 1990 and 2012, derived from CORINE Land Cover(CLC) database, were used to quantify historical forest cover change included in the modelling. To explain the biophysical and anthropogenic effects, this study considered several explanatory factors related to local topography, forest cover pattern, accessibility, urban growth and population density. Using ROC(Receiver Operating Characteristic) and 500 controlling sampling points, the statistical and spatial validations were assessed in order to evaluate the performance of the resulted data. The analysis showed that the area experienced a continuous forest cover change, leading to the loss of over 250,000 ha of forested area during the period 1990–2012. The most significant influence of the explanatory factors of deforestation were noticed in case of distance to forest edge(β=–4.215), forest fragmentation(β=2.231), slope declivity(β=–1.901), elevation(β=1.734) and distance to roads(β=–1.713). The statistical and spatial validation indicates a good accuracy of the model with reasonably AUC(0.736) and Kappa(0.739) values. The model's results suggest an intensification of the deforestation process in the area, designing numerous new clusters with high probability in the Apuseni Mountains, northern and central part of the Eastern Carpathians, western part of the Southern Carpathians and northern part of the Banat Mountains. The study could represent a useful outcome to identify the forests more vulnerable to logging and to adopt appropriate policies and decisions in forest management and conservation. In addition, the resulted probability map could be used in other studies in order to investigate potential environmental implications(e.g. geomorphological hazards or impact on biodiversity and landscape diversity).
文摘Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.
基金The authors would like to thank all anonymous reviewers for their suggestions and feedback.This work was supported by National Natural Science Foundation of China(Grant No.61379103).
文摘Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.
文摘Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.
基金supported by a grant from National Health Department of China(2008ZX10005-009)Roche company
文摘Objective This study was undertaken to investigate the influencing factors on serum ALT level and hepatitis C virus(HCV)RNA titer in chronic hepatitis C(CHC)patients.Methods All patients enrolled into this study were anti-HCV positive.Retrospective tracing method was applied to detect serum ALT level and HCV RNA titer and to collect general information of the patients such as genders,age groups,interferon medication history,infection pathways,height and weight.Then the multi-factor analysis was adopted with the application of binominal logistic regression mode.Results The abnormal rate of ALT level was positively correlated to HCV RNA and gender while negatively correlated to interferon medication history and age group,with Wald value of the 4 factors as 39.604,11.823,18.991 and 7.389,respectively.The positive rate of HCV RNA was negatively correlated to interferon medication history and gender while positively correlated to ALT level,with corresponding Wald value of the 3 factors as81.394,7.618 and 27.562,respectively.Conclusions The normal ALT level in HCV infected patients was associated with viral load,age,gender and interferon medication history,while the normal rate of HCV RNA titer was closely associated with gender,interferon medication history and ALT level.