In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is oppo...A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive 1-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l 〉 20. To corroborate the efficacy and feasibility of fast l-fold cross validation, experiments on five benchmark regression data sets are evaluated.展开更多
Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Predictio...Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.展开更多
Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k ...Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k subsets,which usually causes subsampling bias,inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation.Based on ordered systematic sampling theory in statistics and low-discrepancy sequence theory in number theory,we propose a new k-fold cross-validation procedure by replacing a pseudo-random sequence with a best-discrepancy sequence,which ensures low subsampling bias and leads to more precise expected-prediction-error(EPE)estimates.Experiments with 156 benchmark datasets and three classifiers(logistic regression,decision tree and na?ve bayes)show that in general,our cross-validation procedure can extrude subsampling bias in the MCCV by lowering the EPE around 7.18%and the variances around 26.73%.In comparison,the stratified MCCV can reduce the EPE and variances of the MCCV around 1.58%and 11.85%,respectively.The leave-one-out(LOO)can lower the EPE around 2.50%but its variances are much higher than the any other cross-validation(CV)procedure.The computational time of our cross-validation procedure is just 8.64%of the MCCV,8.67%of the stratified MCCV and 16.72%of the LOO.Experiments also show that our approach is more beneficial for datasets characterized by relatively small size and large aspect ratio.This makes our approach particularly pertinent when solving bioscience classification problems.Our proposed systematic subsampling technique could be generalized to other machine learning algorithms that involve random subsampling mechanism.展开更多
Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new...Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.展开更多
Slope stability prediction research is a complex non-linear system problem.In carrying out slope stability prediction work,it often encounters low accuracy of prediction models and blind data preprocessing.Based on 77...Slope stability prediction research is a complex non-linear system problem.In carrying out slope stability prediction work,it often encounters low accuracy of prediction models and blind data preprocessing.Based on 77 field cases,5 quantitative indicators are selected to improve the accuracy of prediction models for slope stability.These indicators include slope angle,slope height,internal friction angle,cohesion and unit weight of rock and soil.Potential data aggregation in the prediction of slope stability is analyzed and visualized based on Six-dimension reduction methods,namely principal components analysis(PCA),Kernel PCA,factor analysis(FA),independent component analysis(ICA),non-negative matrix factorization(NMF)and t-SNE(stochastic neighbor embedding).Combined with classic machine learning methods,7 prediction models for slope stability are established and their reliabilities are examined by random cross validation.Besides,the significance of each indicator in the prediction of slope stability is discussed using the coefficient of variation method.The research results show that dimension reduction is unnecessary for the data processing of prediction models established in this paper of slope stability.Random forest(RF),support vector machine(SVM)and k-nearest neighbour(KNN)achieve the best prediction accuracy,which is higher than 90%.The decision tree(DT)has better accuracy which is 86%.The most important factor influencing slope stability is slope height,while unit weight of rock and soil is the least significant.RF and SVM models have the best accuracy and superiority in slope stability prediction.The results provide a new approach toward slope stability prediction in geotechnical engineering.展开更多
Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November...Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November to May (NM) over SWIO were conducted. Dynamic parameters including vertical wind shear, mean zonal steering wind and vorticity at 850 mb were derived from NOAA (NCEP-NCAR) reanalysis 1 wind fields. Thermodynamic parameters including monthly and daily mean Sea Surface Temperature (SST), Outgoing Longwave Radiation (OLR) and equatorial Standard Oscillation Index (SOI) were used. Three types of Poison regression models (i.e. dynamic, thermodynamic and combined models) were developed and validated using the Leave One Out Cross Validation (LOOCV). Moreover, 2 × 2 square matrix contingency tables for model verification were used. The results revealed that, the observed and cross validated DJFM and NM TCs and TSs strongly correlated with each other (p ≤ 0.02) for all model types, with correlations (r) ranging from 0.62 - 0.86 for TCs and 0.52 - 0.87 for TSs, indicating great association between these variables. Assessment of the model skill for all model types of DJFM and NM TCs and TSs frequency revealed high skill scores ranging from 38% - 70% for TCs and 26% - 72% for TSs frequency, respectively. Moreover, results indicated that the dynamic and combined models had higher skill scores than the thermodynamic models. The DJFM and NM selected predictors explained the TCs and TSs variability by the range of 0.45 - 0.65 and 0.37 - 0.66, respectively. However, verification analysis revealed that all models were adequate for predicting the seasonal TCs and TSs, with high bias values ranging from 0.85 - 0.94. Conclusively, the study calls for more studies in TCs and TSs frequency and strengths for enhancing the performance of the March to May (MAM) and December to October (OND) seasonal rainfalls in the East African (EA) and Tanzania in particular.展开更多
A statistical dynamic model for forecasting Chinese landfall of tropical cyclones (CLTCs) was developed based on the empirical relationship between the observed CLTC variability and the hindcast atmospheric circulat...A statistical dynamic model for forecasting Chinese landfall of tropical cyclones (CLTCs) was developed based on the empirical relationship between the observed CLTC variability and the hindcast atmospheric circulations from the Pusan National University coupled general circulation model (PNU-CGCM).In the last 31 years,CLTCs have shown strong year-to-year variability,with a maximum frequency in 1994 and a minimum frequency in 1987.Such features were well forecasted by the model.A cross-validation test showed that the correlation between the observed index and the forecasted CLTC index was high,with a coefficient of 0.71.The relative error percentage (16.3%) and root-mean-square error (1.07) were low.Therefore the coupled model performs well in terms of forecasting CLTCs;the model has potential for dynamic forecasting of landfall of tropical cyclones.展开更多
Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human ne...Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons.展开更多
To improve the anti-noise performance of the time-domain Bregman iterative algorithm,an adaptive frequency-domain Bregman sparse-spike deconvolution algorithm is proposed.By solving the Bregman algorithm in the freque...To improve the anti-noise performance of the time-domain Bregman iterative algorithm,an adaptive frequency-domain Bregman sparse-spike deconvolution algorithm is proposed.By solving the Bregman algorithm in the frequency domain,the influence of Gaussian as well as outlier noise on the convergence of the algorithm is effectively avoided.In other words,the proposed algorithm avoids data noise effects by implementing the calculations in the frequency domain.Moreover,the computational efficiency is greatly improved compared with the conventional method.Generalized cross validation is introduced in the solving process to optimize the regularization parameter and thus the algorithm is equipped with strong self-adaptation.Different theoretical models are built and solved using the algorithms in both time and frequency domains.Finally,the proposed and the conventional methods are both used to process actual seismic data.The comparison of the results confirms the superiority of the proposed algorithm due to its noise resistance and self-adaptation capability.展开更多
Pattern classification is an important field in machine learning; least squares support vector machine (LSSVM) is a powerful tool for pattern classification. A new version of LSSVM, SVD-LSSVM, to save time of selectin...Pattern classification is an important field in machine learning; least squares support vector machine (LSSVM) is a powerful tool for pattern classification. A new version of LSSVM, SVD-LSSVM, to save time of selecting hyper parameters for LSSVM is proposed. SVD-LSSVM is trained through singular value decomposition (SVD) of kernel matrix. Cross validation time of selecting hyper parameters can be saved because a new hyper parameter, singular value contribution rate (SVCR), replaces the penalty factor of LSSVM. Several UCI benchmarking data and the Olive classification problem were used to test SVD-LSSVM. The result showed that SVD-LSSVM has good performance in classification and saves time for cross validation.展开更多
For practical engineering structures,it is usually difficult to measure external load distribution in a direct manner,which makes inverse load identification important.Specifically,load identification is a typical inv...For practical engineering structures,it is usually difficult to measure external load distribution in a direct manner,which makes inverse load identification important.Specifically,load identification is a typical inverse problem,for which the models(e.g.,response matrix)are often ill-posed,resulting in degraded accuracy and impaired noise immunity of load identification.This study aims at identifying external loads in a stiffened plate structure,through comparing the effectiveness of different methods for parameter selection in regulation problems,including the Generalized Cross Validation(GCV)method,the Ordinary Cross Validation method and the truncated singular value decomposition method.With demonstrated high accuracy,the GCV method is used to identify concentrated loads in three different directions(e.g.,vertical,lateral and longitudinal)exerted on a stiffened plate.The results show that the GCV method is able to effectively identify multi-source static loads,with relative errors less than 5%.Moreover,under the situation of swept frequency excitation,when the excitation frequency is near the natural frequency of the structure,the GCV method can achieve much higher accuracy compared with direct inversion.At other excitation frequencies,the average recognition error of the GCV method load identification less than 10%.展开更多
The water quality grades of phosphate(PO4-P) and dissolved inorganic nitrogen(DIN) are integrated by spatial partitioning to fit the global and local semi-variograms of these nutrients. Leave-one-out cross validat...The water quality grades of phosphate(PO4-P) and dissolved inorganic nitrogen(DIN) are integrated by spatial partitioning to fit the global and local semi-variograms of these nutrients. Leave-one-out cross validation is used to determine the statistical inference method. To minimize absolute average errors and error mean squares,stratified Kriging(SK) interpolation is applied to DIN and ordinary Kriging(OK) interpolation is applied to PO4-P.Ten percent of the sites is adjusted by considering their impact on the change in deviations in DIN and PO4-P interpolation and the resultant effect on areas with different water quality grades. Thus, seven redundant historical sites are removed. Seven historical sites are distributed in areas with water quality poorer than Grade IV at the north and south branches of the Changjiang(Yangtze River) Estuary and at the coastal region north of the Hangzhou Bay. Numerous sites are installed in these regions. The contents of various elements in the waters are not remarkably changed, and the waters are mixed well. Seven sites that have been optimized and removed are set to water with quality Grades III and IV. Optimization and adjustment of unrestricted areas show that the optimized and adjusted sites are mainly distributed in regions where the water quality grade undergoes transition.Therefore, key sites for adjustment and optimization are located at the boundaries of areas with different water quality grades and seawater.展开更多
A cost estimate is one of the most important steps in road project management. There are ranges of factors that mostly affect the final project cost. Many approaches were used to estimate project cost, which took into...A cost estimate is one of the most important steps in road project management. There are ranges of factors that mostly affect the final project cost. Many approaches were used to estimate project cost, which took into consideration probable project performance and risks. The aim is to improve the ability of construction managers to predict a parametric cost estimate for road projects using SVM (support vector machine). The work is based on collecting historical road executed cases. The 12 factors were identified to be the most important factors affecting the cost-estimating model. A total of 70 case studies from historical data were divided randomly into three sets: training set includes 60 cases, cross validation set includes three cases and testing set includes seven cases. The built model was successfully able to predict project cost to the AP (accuracy performance) of 95%.展开更多
By using Pedersen's verifiable secret sharing scheme and the theory of crossvalidation, we propose an a-nonymous payment protocol which have following features: protecting theconfidentiality of sensitive payment i...By using Pedersen's verifiable secret sharing scheme and the theory of crossvalidation, we propose an a-nonymous payment protocol which have following features: protecting theconfidentiality of sensitive payment information from spying by malicioushosts; using a trustedthird party in a minimal way; verifying the validity of the share by the merchant; allowing agent toverify that the product which it is a-bout to receive is the one it is paying for; keeping thecustomer anonymous.展开更多
Current cancer diagnosis procedure requires expert knowledge and is time-consuming,which raises the need to build an accurate diagnosis support system for lymphoma identification and classification.Many studies have s...Current cancer diagnosis procedure requires expert knowledge and is time-consuming,which raises the need to build an accurate diagnosis support system for lymphoma identification and classification.Many studies have shown promising results using Machine Learning and,recently,Deep Learning to detect malignancy in cancer cells.However,the diversity and complexity of the morphological structure of lymphoma make it a challenging classification problem.In literature,many attempts were made to classify up to four simple types of lymphoma.This paper presents an approach using a reliable model capable of diagnosing seven different categories of rare and aggressive lymphoma.These Lymphoma types are Classical Hodgkin Lymphoma,Nodular Lymphoma Predominant,Burkitt Lymphoma,Follicular Lymphoma,Mantle Lymphoma,Large B-Cell Lymphoma,and T-Cell Lymphoma.Our proposed approach uses Residual Neural Networks,ResNet50,with a Transfer Learning for lymphoma’s detection and classification.The model used results are validated according to the performance evaluation metrics:Accuracy,precision,recall,F-score,and kappa score for the seven multi-classes.Our algorithms are tested,and the results are validated on 323 images of 224×224 pixels resolution.The results are promising and show that our used model can classify and predict the correct lymphoma subtype with an accuracy of 91.6%.展开更多
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
基金supported by the National Natural Science Foundation of China(51006052)the NUST Outstanding Scholar Supporting Program
文摘A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive 1-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l 〉 20. To corroborate the efficacy and feasibility of fast l-fold cross validation, experiments on five benchmark regression data sets are evaluated.
基金supported by the US Department of Agriculture,Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive grant no.2015-67015-22947
文摘Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
基金supported by the Qilu Youth Scholar Project of Shandong Universitysupported by National Natural Science Foundation of China(Grant No.11531008)+1 种基金the Ministry of Education of China(Grant No.IRT16R43)the Taishan Scholar Project of Shandong Province。
文摘Statistical machine learning models should be evaluated and validated before putting to work.Conventional k-fold Monte Carlo cross-validation(MCCV)procedure uses a pseudo-random sequence to partition instances into k subsets,which usually causes subsampling bias,inflates generalization errors and jeopardizes the reliability and effectiveness of cross-validation.Based on ordered systematic sampling theory in statistics and low-discrepancy sequence theory in number theory,we propose a new k-fold cross-validation procedure by replacing a pseudo-random sequence with a best-discrepancy sequence,which ensures low subsampling bias and leads to more precise expected-prediction-error(EPE)estimates.Experiments with 156 benchmark datasets and three classifiers(logistic regression,decision tree and na?ve bayes)show that in general,our cross-validation procedure can extrude subsampling bias in the MCCV by lowering the EPE around 7.18%and the variances around 26.73%.In comparison,the stratified MCCV can reduce the EPE and variances of the MCCV around 1.58%and 11.85%,respectively.The leave-one-out(LOO)can lower the EPE around 2.50%but its variances are much higher than the any other cross-validation(CV)procedure.The computational time of our cross-validation procedure is just 8.64%of the MCCV,8.67%of the stratified MCCV and 16.72%of the LOO.Experiments also show that our approach is more beneficial for datasets characterized by relatively small size and large aspect ratio.This makes our approach particularly pertinent when solving bioscience classification problems.Our proposed systematic subsampling technique could be generalized to other machine learning algorithms that involve random subsampling mechanism.
基金by National Key R&D Program of China(2020AAA0105200)the Ministry of Science and Technology of China(Grant no.2016YFB0502301)+1 种基金the National Natural Science Foundation of China(Grant nos.11871294,12031016,11971323,71925007,72042019,72091212 and 12001559)a joint grant from the Academy for Multidisciplinary Studies,Capital Normal University.
文摘Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.
基金by the National Natural Science Foundation of China(No.52174114)the State Key Laboratory of Hydroscience and Engineering of Tsinghua University(No.61010101218).
文摘Slope stability prediction research is a complex non-linear system problem.In carrying out slope stability prediction work,it often encounters low accuracy of prediction models and blind data preprocessing.Based on 77 field cases,5 quantitative indicators are selected to improve the accuracy of prediction models for slope stability.These indicators include slope angle,slope height,internal friction angle,cohesion and unit weight of rock and soil.Potential data aggregation in the prediction of slope stability is analyzed and visualized based on Six-dimension reduction methods,namely principal components analysis(PCA),Kernel PCA,factor analysis(FA),independent component analysis(ICA),non-negative matrix factorization(NMF)and t-SNE(stochastic neighbor embedding).Combined with classic machine learning methods,7 prediction models for slope stability are established and their reliabilities are examined by random cross validation.Besides,the significance of each indicator in the prediction of slope stability is discussed using the coefficient of variation method.The research results show that dimension reduction is unnecessary for the data processing of prediction models established in this paper of slope stability.Random forest(RF),support vector machine(SVM)and k-nearest neighbour(KNN)achieve the best prediction accuracy,which is higher than 90%.The decision tree(DT)has better accuracy which is 86%.The most important factor influencing slope stability is slope height,while unit weight of rock and soil is the least significant.RF and SVM models have the best accuracy and superiority in slope stability prediction.The results provide a new approach toward slope stability prediction in geotechnical engineering.
文摘Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November to May (NM) over SWIO were conducted. Dynamic parameters including vertical wind shear, mean zonal steering wind and vorticity at 850 mb were derived from NOAA (NCEP-NCAR) reanalysis 1 wind fields. Thermodynamic parameters including monthly and daily mean Sea Surface Temperature (SST), Outgoing Longwave Radiation (OLR) and equatorial Standard Oscillation Index (SOI) were used. Three types of Poison regression models (i.e. dynamic, thermodynamic and combined models) were developed and validated using the Leave One Out Cross Validation (LOOCV). Moreover, 2 × 2 square matrix contingency tables for model verification were used. The results revealed that, the observed and cross validated DJFM and NM TCs and TSs strongly correlated with each other (p ≤ 0.02) for all model types, with correlations (r) ranging from 0.62 - 0.86 for TCs and 0.52 - 0.87 for TSs, indicating great association between these variables. Assessment of the model skill for all model types of DJFM and NM TCs and TSs frequency revealed high skill scores ranging from 38% - 70% for TCs and 26% - 72% for TSs frequency, respectively. Moreover, results indicated that the dynamic and combined models had higher skill scores than the thermodynamic models. The DJFM and NM selected predictors explained the TCs and TSs variability by the range of 0.45 - 0.65 and 0.37 - 0.66, respectively. However, verification analysis revealed that all models were adequate for predicting the seasonal TCs and TSs, with high bias values ranging from 0.85 - 0.94. Conclusively, the study calls for more studies in TCs and TSs frequency and strengths for enhancing the performance of the March to May (MAM) and December to October (OND) seasonal rainfalls in the East African (EA) and Tanzania in particular.
基金supported by the Chinese Academy of Sciences key program(Grant No. KZCX2-YW-Q03-3)the Korea Meteorological Administration Research and Development Program(Grant No. CATER 2009-1147)+1 种基金the Korea Rural Development Administration Research and Development Programthe National Basic Research Program of China (Grant No. 2009CB421406)
文摘A statistical dynamic model for forecasting Chinese landfall of tropical cyclones (CLTCs) was developed based on the empirical relationship between the observed CLTC variability and the hindcast atmospheric circulations from the Pusan National University coupled general circulation model (PNU-CGCM).In the last 31 years,CLTCs have shown strong year-to-year variability,with a maximum frequency in 1994 and a minimum frequency in 1987.Such features were well forecasted by the model.A cross-validation test showed that the correlation between the observed index and the forecasted CLTC index was high,with a coefficient of 0.71.The relative error percentage (16.3%) and root-mean-square error (1.07) were low.Therefore the coupled model performs well in terms of forecasting CLTCs;the model has potential for dynamic forecasting of landfall of tropical cyclones.
基金supported by the National Natural Science Foundation of China,No.10872069
文摘Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons.
基金supported by the National Natural Science Foundation of China(No.NSFC 41204101)Open Projects Fund of the State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation(No.PLN201733)+1 种基金Youth Innovation Promotion Association of the Chinese Academy of Sciences(No.2015051)Open Projects Fund of the Natural Gas and Geology Key Laboratory of Sichuan Province(No.2015trqdz03)
文摘To improve the anti-noise performance of the time-domain Bregman iterative algorithm,an adaptive frequency-domain Bregman sparse-spike deconvolution algorithm is proposed.By solving the Bregman algorithm in the frequency domain,the influence of Gaussian as well as outlier noise on the convergence of the algorithm is effectively avoided.In other words,the proposed algorithm avoids data noise effects by implementing the calculations in the frequency domain.Moreover,the computational efficiency is greatly improved compared with the conventional method.Generalized cross validation is introduced in the solving process to optimize the regularization parameter and thus the algorithm is equipped with strong self-adaptation.Different theoretical models are built and solved using the algorithms in both time and frequency domains.Finally,the proposed and the conventional methods are both used to process actual seismic data.The comparison of the results confirms the superiority of the proposed algorithm due to its noise resistance and self-adaptation capability.
基金Project (No. 20276063) supported by the National Natural Science Foundation of China
文摘Pattern classification is an important field in machine learning; least squares support vector machine (LSSVM) is a powerful tool for pattern classification. A new version of LSSVM, SVD-LSSVM, to save time of selecting hyper parameters for LSSVM is proposed. SVD-LSSVM is trained through singular value decomposition (SVD) of kernel matrix. Cross validation time of selecting hyper parameters can be saved because a new hyper parameter, singular value contribution rate (SVCR), replaces the penalty factor of LSSVM. Several UCI benchmarking data and the Olive classification problem were used to test SVD-LSSVM. The result showed that SVD-LSSVM has good performance in classification and saves time for cross validation.
基金funding for this study from National Key R&D Program of China(2018YFA0702800)National Natural Science Foundation of China(12072056)+1 种基金the Fundamental Research Funds for the Central Universities(DUT19LK49)Nantong Science and Technology Plan Project(No.MS22019016).
文摘For practical engineering structures,it is usually difficult to measure external load distribution in a direct manner,which makes inverse load identification important.Specifically,load identification is a typical inverse problem,for which the models(e.g.,response matrix)are often ill-posed,resulting in degraded accuracy and impaired noise immunity of load identification.This study aims at identifying external loads in a stiffened plate structure,through comparing the effectiveness of different methods for parameter selection in regulation problems,including the Generalized Cross Validation(GCV)method,the Ordinary Cross Validation method and the truncated singular value decomposition method.With demonstrated high accuracy,the GCV method is used to identify concentrated loads in three different directions(e.g.,vertical,lateral and longitudinal)exerted on a stiffened plate.The results show that the GCV method is able to effectively identify multi-source static loads,with relative errors less than 5%.Moreover,under the situation of swept frequency excitation,when the excitation frequency is near the natural frequency of the structure,the GCV method can achieve much higher accuracy compared with direct inversion.At other excitation frequencies,the average recognition error of the GCV method load identification less than 10%.
基金The National Natural Science Fundation of China under contract Nos 41376190,41271404,41531179,41421001 and41601425the Open Funds of the Key Laboratory of Integrated Monitoring and Applied Technologies for Marin Harmful Algal Blooms,SOA under contract No.MATHA201120204+1 种基金the Scientific Research Project of Shanghai Marine Bureau under contract No.Hu Hai Ke2016-05the Ocean Public Welfare Scientific Research Project,State Oceanic Administration of the People's Republic of China under contract Nos 201305027 and 201505008
文摘The water quality grades of phosphate(PO4-P) and dissolved inorganic nitrogen(DIN) are integrated by spatial partitioning to fit the global and local semi-variograms of these nutrients. Leave-one-out cross validation is used to determine the statistical inference method. To minimize absolute average errors and error mean squares,stratified Kriging(SK) interpolation is applied to DIN and ordinary Kriging(OK) interpolation is applied to PO4-P.Ten percent of the sites is adjusted by considering their impact on the change in deviations in DIN and PO4-P interpolation and the resultant effect on areas with different water quality grades. Thus, seven redundant historical sites are removed. Seven historical sites are distributed in areas with water quality poorer than Grade IV at the north and south branches of the Changjiang(Yangtze River) Estuary and at the coastal region north of the Hangzhou Bay. Numerous sites are installed in these regions. The contents of various elements in the waters are not remarkably changed, and the waters are mixed well. Seven sites that have been optimized and removed are set to water with quality Grades III and IV. Optimization and adjustment of unrestricted areas show that the optimized and adjusted sites are mainly distributed in regions where the water quality grade undergoes transition.Therefore, key sites for adjustment and optimization are located at the boundaries of areas with different water quality grades and seawater.
文摘A cost estimate is one of the most important steps in road project management. There are ranges of factors that mostly affect the final project cost. Many approaches were used to estimate project cost, which took into consideration probable project performance and risks. The aim is to improve the ability of construction managers to predict a parametric cost estimate for road projects using SVM (support vector machine). The work is based on collecting historical road executed cases. The 12 factors were identified to be the most important factors affecting the cost-estimating model. A total of 70 case studies from historical data were divided randomly into three sets: training set includes 60 cases, cross validation set includes three cases and testing set includes seven cases. The built model was successfully able to predict project cost to the AP (accuracy performance) of 95%.
文摘By using Pedersen's verifiable secret sharing scheme and the theory of crossvalidation, we propose an a-nonymous payment protocol which have following features: protecting theconfidentiality of sensitive payment information from spying by malicioushosts; using a trustedthird party in a minimal way; verifying the validity of the share by the merchant; allowing agent toverify that the product which it is a-bout to receive is the one it is paying for; keeping thecustomer anonymous.
文摘Current cancer diagnosis procedure requires expert knowledge and is time-consuming,which raises the need to build an accurate diagnosis support system for lymphoma identification and classification.Many studies have shown promising results using Machine Learning and,recently,Deep Learning to detect malignancy in cancer cells.However,the diversity and complexity of the morphological structure of lymphoma make it a challenging classification problem.In literature,many attempts were made to classify up to four simple types of lymphoma.This paper presents an approach using a reliable model capable of diagnosing seven different categories of rare and aggressive lymphoma.These Lymphoma types are Classical Hodgkin Lymphoma,Nodular Lymphoma Predominant,Burkitt Lymphoma,Follicular Lymphoma,Mantle Lymphoma,Large B-Cell Lymphoma,and T-Cell Lymphoma.Our proposed approach uses Residual Neural Networks,ResNet50,with a Transfer Learning for lymphoma’s detection and classification.The model used results are validated according to the performance evaluation metrics:Accuracy,precision,recall,F-score,and kappa score for the seven multi-classes.Our algorithms are tested,and the results are validated on 323 images of 224×224 pixels resolution.The results are promising and show that our used model can classify and predict the correct lymphoma subtype with an accuracy of 91.6%.