In our previous research,a logistic regression prediction model for hepatotoxicity of Chinese herbal medicines based on the four properties,five flavors and channel tropism has been successfully established.However,co...In our previous research,a logistic regression prediction model for hepatotoxicity of Chinese herbal medicines based on the four properties,five flavors and channel tropism has been successfully established.However,could Chinese herbal medicines efficacy also be applied to predict the hepatotoxicity of Chinese herbal medicines?Therefore,a logistic regression prediction model for hepatotoxicity of Chinese herbal medicines based on Chinese herbal medicines efficacy has been tentatively set up to study the correlations of hepatotoxic and nonhepatotoxic Chinese herbal medicines with efficacy by using a chi-square test for two-way unordered categorical data.Logistic regression prediction model was established and the accuracy of the prediction by this model was evaluated.It has been found that the hepatotoxicity and nonhepatotoxicity of Chinese herbal medicines were weakly related to the efficacy,and the coefficient was 0.295.There were 20 variables from Chinese herbal medicines efficacy analyzed with unconditional logistic regression,and 6 variables,rectifying Qi and relieving pain,clearing heat and disinhibiting dampness,invigorating blood and stopping pain,invigorating blood and relieving swelling,killing worms and relieving fright were chosen to establish the logistic regression prediction model,with the optimal cutoff value being 0.250.Dissipating cold and relieving pain(DCRP),clearing heat and disinhibiting dampness,invigorating blood and relieving pain(IBRP),invigorating blood and relieving swelling,killing worms,and relieving fright were the variables to affect the hepatotoxicity and the established logistic regression prediction model had predictive power for hepatotoxicity of Chinese herbal medicines to a certain degree.展开更多
Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Suc...Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Such a single modeling approach may obtain inconsistent prediction results in classification and regression and thus may not meet the needs of practical applications well.To address this issue,this study proposes a selective Naive Bayes ensemble model(SENB-EM)by introducing causal effect and voting strategy on Naive Bayes.The new model can not only screen effective predictors but also perform classification and regression prediction simultaneously.After being applied to the area prediction of summer western North Pacific subtropical high(WNPSH)from 2008 to 2021,it is found that the accuracy classification score(a metric to assess the overall classification prediction accuracy)and the time correlation coefficient(TCC)of SENB-EM can reach 1.0 and 0.81,respectively.After integrating the results of different models[including multiple linear regression ensemble model(MLR-EM),SENB-EM,and Chinese Multimodel Ensemble Prediction System(CMME)used by National Climate Center(NCC)]for 2017-2021,the TCC of the ensemble results of SENB-EM and CMME can reach 0.92(the highest result among them).This indicates that the prediction results of the summer WNPSH area provided by SENB-EM have a high reference value for the real-time prediction.It is worth noting that,except for the numerical prediction results,the SENB-EM model can also give the range of numerical prediction intervals and predictions for anomalous degrees of the WNPSH area,thus providing more reference information for meteorological forecasters.Overall,as a new hybrid machine learning model,the SENB-EM has a good prediction ability;the approach of performing classification prediction and regression prediction simultaneously through integration is informative to short-term climate prediction.展开更多
In this paper, we consider median unbiased estimation of bivariate predictive regression models with non-normal, heavy-tailed or heteroscedastic errors. We construct confidence intervals and median unbiased estimator ...In this paper, we consider median unbiased estimation of bivariate predictive regression models with non-normal, heavy-tailed or heteroscedastic errors. We construct confidence intervals and median unbiased estimator for the parameter of interest. We show that the proposed estimator has better predictive potential than the usual least squares estimator via simulation. An empirical application to finance is given. And a possible extension of the estimation procedure to cointegration models is also described.展开更多
In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is ...In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is of great interest. To this end, we propose here a dynamic logistic regression method. Specifically, we assume that one has observed a time series of network structure. Then the proposed model dynamically predicts future links by studying the network structure in the past. To estimate the model, we find that the standard maximum likelihood estimation(MLE) is computationally forbidden. To solve the problem, we introduce a novel conditional maximum likelihood estimation(CMLE) method, which is computationally feasible for large-scale networks. We demonstrate the performance of the proposed method by extensive numerical studies.展开更多
Macroeconomic situation is the overall performance of a country’s and regional economic situation.At present,the vast majority of macroeconomic indicators are obtained through sampling surveys,step-by-step reporting,...Macroeconomic situation is the overall performance of a country’s and regional economic situation.At present,the vast majority of macroeconomic indicators are obtained through sampling surveys,step-by-step reporting,statistical calculations,and other processes,which are publicly released by the Statistical Bureau.There are some shortcomings,such as lag and non-authenticity.Timely forecasting and early warning of macroeconomic trends are the important needs of government affairs.However,the timeliness of data has a direct impact on government decision-making.In this paper,the high frequency and relatively accurate big data sources are adopted to construct a multivariate regression prediction model for traditional national economic accounting indicators(such as industrial value added above the scale of Hefei),which is different from the traditional time series prediction model such as ARIMA model.Based on the macroeconomic prediction model of time series big data,multi-latitude data sources,sequential update,verification set screening model and other strategies are used to provide more reliable,timely,and easy-to-understand forecasting values of national economic accounting indicators.At the same time,the potential influencing factors of macroeconomic indicators are excavated to provide data and theoretical basis for macroeconomic analysis and decision-making.展开更多
基金This work was supported by the Project of National Natural Science Foundation of China(No.82074306)the Shenzhen Health and Family Planning System Research Project(No.SZBC2018007)the Project of Traditional Chinese Medicine Bureau of Guangdong Province(No.20201073).
文摘In our previous research,a logistic regression prediction model for hepatotoxicity of Chinese herbal medicines based on the four properties,five flavors and channel tropism has been successfully established.However,could Chinese herbal medicines efficacy also be applied to predict the hepatotoxicity of Chinese herbal medicines?Therefore,a logistic regression prediction model for hepatotoxicity of Chinese herbal medicines based on Chinese herbal medicines efficacy has been tentatively set up to study the correlations of hepatotoxic and nonhepatotoxic Chinese herbal medicines with efficacy by using a chi-square test for two-way unordered categorical data.Logistic regression prediction model was established and the accuracy of the prediction by this model was evaluated.It has been found that the hepatotoxicity and nonhepatotoxicity of Chinese herbal medicines were weakly related to the efficacy,and the coefficient was 0.295.There were 20 variables from Chinese herbal medicines efficacy analyzed with unconditional logistic regression,and 6 variables,rectifying Qi and relieving pain,clearing heat and disinhibiting dampness,invigorating blood and stopping pain,invigorating blood and relieving swelling,killing worms and relieving fright were chosen to establish the logistic regression prediction model,with the optimal cutoff value being 0.250.Dissipating cold and relieving pain(DCRP),clearing heat and disinhibiting dampness,invigorating blood and relieving pain(IBRP),invigorating blood and relieving swelling,killing worms,and relieving fright were the variables to affect the hepatotoxicity and the established logistic regression prediction model had predictive power for hepatotoxicity of Chinese herbal medicines to a certain degree.
基金Supported by the National Natural Science Foundation of China (42130610,41975076,and 42175067)National Key Research and Development Program of China (2019YFA0607104)。
文摘Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Such a single modeling approach may obtain inconsistent prediction results in classification and regression and thus may not meet the needs of practical applications well.To address this issue,this study proposes a selective Naive Bayes ensemble model(SENB-EM)by introducing causal effect and voting strategy on Naive Bayes.The new model can not only screen effective predictors but also perform classification and regression prediction simultaneously.After being applied to the area prediction of summer western North Pacific subtropical high(WNPSH)from 2008 to 2021,it is found that the accuracy classification score(a metric to assess the overall classification prediction accuracy)and the time correlation coefficient(TCC)of SENB-EM can reach 1.0 and 0.81,respectively.After integrating the results of different models[including multiple linear regression ensemble model(MLR-EM),SENB-EM,and Chinese Multimodel Ensemble Prediction System(CMME)used by National Climate Center(NCC)]for 2017-2021,the TCC of the ensemble results of SENB-EM and CMME can reach 0.92(the highest result among them).This indicates that the prediction results of the summer WNPSH area provided by SENB-EM have a high reference value for the real-time prediction.It is worth noting that,except for the numerical prediction results,the SENB-EM model can also give the range of numerical prediction intervals and predictions for anomalous degrees of the WNPSH area,thus providing more reference information for meteorological forecasters.Overall,as a new hybrid machine learning model,the SENB-EM has a good prediction ability;the approach of performing classification prediction and regression prediction simultaneously through integration is informative to short-term climate prediction.
基金The NNSF(10571073)of china,and 985 project of Jilin University.
文摘In this paper, we consider median unbiased estimation of bivariate predictive regression models with non-normal, heavy-tailed or heteroscedastic errors. We construct confidence intervals and median unbiased estimator for the parameter of interest. We show that the proposed estimator has better predictive potential than the usual least squares estimator via simulation. An empirical application to finance is given. And a possible extension of the estimation procedure to cointegration models is also described.
基金supported by National Natural Science Foundation of China (Grant Nos. 11131002, 11271031, 71532001, 11525101, 71271210 and 714711730)the Business Intelligence Research Center at Peking University+5 种基金the Center for Statistical Science at Peking Universitythe Fundamental Research Funds for the Central Universitiesthe Research Funds of Renmin University of China (Grant No. 16XNLF01)Ministry of Education Humanities Social Science Key Research Institute in University Foundation (Grant No. 14JJD910002)the Center for Applied Statistics, School of Statistics, Renmin University of ChinallChina Postdoctoral Science Foundation (Grant No. 2016M600155)
文摘In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is of great interest. To this end, we propose here a dynamic logistic regression method. Specifically, we assume that one has observed a time series of network structure. Then the proposed model dynamically predicts future links by studying the network structure in the past. To estimate the model, we find that the standard maximum likelihood estimation(MLE) is computationally forbidden. To solve the problem, we introduce a novel conditional maximum likelihood estimation(CMLE) method, which is computationally feasible for large-scale networks. We demonstrate the performance of the proposed method by extensive numerical studies.
基金The work is supported by the NSF of China(No.11871447)Anhui Initiative in Quantum Information Technologies(AHY150200).
文摘Macroeconomic situation is the overall performance of a country’s and regional economic situation.At present,the vast majority of macroeconomic indicators are obtained through sampling surveys,step-by-step reporting,statistical calculations,and other processes,which are publicly released by the Statistical Bureau.There are some shortcomings,such as lag and non-authenticity.Timely forecasting and early warning of macroeconomic trends are the important needs of government affairs.However,the timeliness of data has a direct impact on government decision-making.In this paper,the high frequency and relatively accurate big data sources are adopted to construct a multivariate regression prediction model for traditional national economic accounting indicators(such as industrial value added above the scale of Hefei),which is different from the traditional time series prediction model such as ARIMA model.Based on the macroeconomic prediction model of time series big data,multi-latitude data sources,sequential update,verification set screening model and other strategies are used to provide more reliable,timely,and easy-to-understand forecasting values of national economic accounting indicators.At the same time,the potential influencing factors of macroeconomic indicators are excavated to provide data and theoretical basis for macroeconomic analysis and decision-making.