Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate ...The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for deliveri...Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.展开更多
Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection ...Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.展开更多
In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for pr...In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.展开更多
This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. ...This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classifi...Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金Under the auspices of National Natural Science Foundation of China(No.42101414)Natural Science Found for Outstanding Young Scholars in Jilin Province(No.20230508106RC)。
文摘The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
文摘Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number(IF2-PSAU-2022/01/22043)。
文摘Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.
文摘In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.
文摘This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
基金The authors would like to thank all anonymous reviewers for their suggestions and feedback.This work was supported by National Natural Science Foundation of China(Grant No.61379103).
文摘Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.