Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of informatio...Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.展开更多
Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there sti...Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there still leaves a great deal of data unavailable. This makesdata prediction a valuable work. A method that can predict data of product under development basedon the existing similar products is proposed. Fuzzy theory is used to deal with the uncertainties indata prediction process. The proposed method can be used in life cycle design, life cycleassessment (LCA) etc. Case study on current refrigerator is used as a demonstration example.展开更多
The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration o...The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.展开更多
This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Da...This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Dalian,China.Due to the large error between the initial geological exploration data and real strata,the project construction is extremely difficult.In view of the current situation regarding the project,a quantitative method for evaluating the tunneling efficiency was proposed using cutterhead rotation(R),advance speed(S),total thrust(F)and torque(T).A total of 80 datasets with three input parameters and one output variable(F or T)were collected from this project,and a prediction framework based gray system model was established.Based on the prediction model,five prediction schemes were set up.Through error analysis,the optimal prediction scheme was obtained from the five schemes.The parametric investigation performed indicates that the relationships between F and the three input variables in the gray system model harmonize with the theoretical explanation.The case shows that the shield tunneling performance and efficiency are improved by the tunneling parameter prediction model based on the gray system model.展开更多
On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random in...On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.展开更多
Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reduc...Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reducing redundant data communication.It also discusses the Bayesian analysis that finds the conditional probability of at least two parametric based predictions for the data.The paper presents a method for improving the performance of Bayesian classification using the combination of Kalman Filter and K-means.The method is applied on a small dataset just for establishing the fact that the proposed algorithm can reduce the time for computing the clusters from data.The proposed Bayesian learning probabilistic model is used to check the statistical noise and other inaccuracies using unknown variables.This scenario is being implemented using efficient machine learning algorithm to perpetuate the Bayesian probabilistic approach.It also demonstrates the generative function forKalman-filer based prediction model and its observations.This paper implements the algorithm using open source platform of Python and efficiently integrates all different modules to piece of code via Common Platform Enumeration(CPE)for Python.展开更多
Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in t...Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.展开更多
Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-...Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.展开更多
The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was wea...The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.展开更多
A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The pre...A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The predicted data are used to draw washability curves and to provide a rapid evaluation of the effect from heavy medium induced separation.Thirty-one production shifts worth of fast float/sink data and the corresponding quick ash data are used to verify the model.The results show a small error with an arithmetic average of 0.53 and an absolute average error of 1.50.This indicates that this model has high precision.The theoretical yield from the washability curves is 76.47% for the monthly comprehensive data and 81.31% using the model data.This is for a desired cleaned coal ash of 9%.The relative error between these two is 6.33%,which is small and indicates that the predicted data can be used to rapidly evaluate the separation effect of gravity separation equipment.展开更多
Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each appl...Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each application scenario to a certain extent.In this paper,we select the time series prediction problem in the atmospheric environment scenario to start the application research.In terms of data support,we obtain the data of nearly 3500 vehicles in some cities in China fromRunwoda Research Institute,focusing on the major pollutant emission data of non-road mobile machinery and high emission vehicles in Beijing and Bozhou,Anhui Province to build the dataset and conduct the time series prediction analysis experiments on them.This paper proposes a P-gLSTNet model,and uses Autoregressive Integrated Moving Average model(ARIMA),long and short-term memory(LSTM),and Prophet to predict and compare the emissions in the future period.The experiments are validated on four public data sets and one self-collected data set,and the mean absolute error(MAE),root mean square error(RMSE),and mean absolute percentage error(MAPE)are selected as the evaluationmetrics.The experimental results show that the proposed P-gLSTNet fusion model predicts less error,outperforms the backbone method,and is more suitable for the prediction of time-series data in this scenario.展开更多
Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and...Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and time consuming.Alternative methods are numerical models.These methods used measured experimental data to develop a representative model for predicting desired parameters.In this study,to predict saturation pressure,oil formation volume factor,and solution gas oil ratio,several Artificial Intelligent(AI)models were developed.582 reported data sets were used as data bank that covers a wide range of fluid properties.Accuracy and reliability of the model was examined by some statistical parameters such as correlation coefficient(R2),average absolute relative deviation(AARD),and root mean square error(RMSE).The results illustrated good accordance between predicted data and target values.The model was also compared with previous works and developed empirical correlations which indicated that it is more reliable than all compared models and correlations.At the end,relevancy factor was calculated for each input parameters to illustrate the impact of different parameters on the predicted values.Relevancy factor showed that in these models,solution gas oil ratio has greatest impact on both saturation pressure and oil formation volume factor.In the other hand,saturation pressure has greatest effect on solution gas oil ratio.展开更多
Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necess...Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necessary to obtain the data efficiently by reducing the consumption of nodal energy.Hence,a technique for data reduction in WSN is presented in this paper by proposing a prediction algorithm,called Hierarchical Fractional Bidirectional Least-Mean Square(HFBLMS)algorithm.The novel algorithm is designed by modifying Hierarchical Least-Mean Square(HLMS)algorithm with the inclusion of BLMS for bidirectional-based data prediction and Fractional Calculus(FC)in the weight update process.Data redundancy is achieved by transmitting only those data required based on the data predicted at the sensor node and the sink.Moreover,the proposed HFBLMS algorithm reduces the energy consumption in the network by the effective prediction attained by BLMS.Two metrics,such as energy consumption and prediction error,are used for the evaluation of performance of the HFBLMS prediction algorithm,where it can attain energy values of 0.3587 and 0.1953 at the maximum number of rounds and prediction errors of just 0.0213 and 0.0095,using air quality and localization datasets,respectively.展开更多
Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict ...Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.展开更多
Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed case...Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed cases while the outbreak in other countries has led to their healthcare systems breakdown. In this work, we introduce an efficient framework called COMAP (COrona MAP), aiming to study and predict the behavior of COVID-19 based on deep learning techniques. COMAP consists of two stages: clustering and prediction. The first stage proposes a new algorithm called Co-means, allowing to group countries having similar behavior of COVID-19 into clusters. The second stage predicts the outbreak’s growth by introducing two adopted versions of LSTM and Prophet applied at country and continent scales. The simulations conducted on the data collected by WHO demonstrated the efficiency of COMAP in terms of returning accurate clustering and predictions.展开更多
Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismi...Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismic intensity attenuation based on 560 intensity data obtained in Iran in the time period 1975-2013. The geometric spreading and energy absorption of seismic waves have been considered in the proposed model. The proposed easy to implement relation describes the intensity simply as a function of moment magnitude, source to site distance and focal depth. The prediction capability of the proposed model is assessed by means of residuals analysis. Prediction results have been compared with those of other intensity prediction models for Italy, Turkey, Iran and central Asia. The results indicate the higher attenuation rate for the study area in distances less than 70 km.展开更多
Numerical weather prediction(NWP)data possess internal inaccuracies,such as low NWP wind speed corresponding to high actual wind power generation.This study is intended to reduce the negative effects of such inaccurac...Numerical weather prediction(NWP)data possess internal inaccuracies,such as low NWP wind speed corresponding to high actual wind power generation.This study is intended to reduce the negative effects of such inaccuracies by proposing a pure data-selection framework(PDF)to choose useful data prior to modeling,thus improving the accuracy of day-ahead wind power forecasting.Briefly,we convert an entire NWP training dataset into many small subsets and then select the best subset combination via a validation set to build a forecasting model.Although a small subset can increase selection flexibility,it can also produce billions of subset combinations,resulting in computational issues.To address this problem,we incorporated metamodeling and optimization steps into PDF.We then proposed a design and analysis of the computer experiments-based metamodeling algorithm and heuristic-exhaustive search optimization algorithm,respectively.Experimental results demonstrate that(1)it is necessary to select data before constructing a forecasting model;(2)using a smaller subset will likely increase selection flexibility,leading to a more accurate forecasting model;(3)PDF can generate a better training dataset than similarity-based data selection methods(e.g.,K-means and support vector classification);and(4)choosing data before building a forecasting model produces a more accurate forecasting model compared with using a machine learning method to construct a model directly.展开更多
The 21st Century era and new modern technologies surrounding us day-in and day-out have opened a new door to“Pandora Box”,that we do know it as AI(artificial intelligence)and its two essential integrated components ...The 21st Century era and new modern technologies surrounding us day-in and day-out have opened a new door to“Pandora Box”,that we do know it as AI(artificial intelligence)and its two essential integrated components namely ML(machine learning)and DL(deep learning).However,the strive and progress in AI,ML,and DL pretty much has taken over any industry that we can think of,when it comes to dealing with cloud of structured data in form of BD(big data).A NPP(nuclear power plant)has multiple complicated dynamic system-of-components that have nonlinear behaviors.For controlling the plant operation under both normal and abnormal conditions,the different systems in NPPs(e.g.,the reactor core components,primary and secondary coolant systems)are usually monitored continuously,which leads to very huge amounts of data.Of course Nuclear Power Industry in form of GEN-IV(Generation IV)has not been left behind in this 21st century era by moving out of GEN-III(Generation III)to more modulars form of GEN-IV,known as SMRs(small modular reactors),with a lot of electronic gadgets and electronics that read data and information from it to support safety of these reactor,while in operation with a built in PRA(probabilistic risk assessment),which requires augmentation of AI in them to enhance performance of human operators that are engaged with day-to-day smooth operation of these reactors to make them safe and safer as well as resilience against any natural or man-made disasters by obtaining information through ML from DL that is collecting massive stream of data coming via omni-direction.Integration of AI with HI(human intelligence)is not separable,when it comes to operation of these smart SMRs with state of the art and smart control rooms with human in them as actors.This TM(technical memorandum)is describing the necessity of AI playing with nuclear reactor power plant of GEN-IV being in operation within near term sooner than later,when specially we are facing today’s cyber-attacks with their smart malware agents at work.展开更多
Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately...Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately. In this paper, we simulate an extreme precipitation event with ensemble Kalman filter(En KF) assimilation of Doppler radial-velocity observations, and analyze the uncertainties of the assimilation. The results demonstrate that, without assimilation radar data, neither a single initialization of deterministic forecast nor an ensemble forecast with adding perturbations or multiple physical parameterizations can predict the location of strong precipitation. However, forecast was significantly improved with assimilation of radar data, especially the location of the precipitation. The direct cause of the improvement is the buildup of a deep mesoscale convection system with En KF assimilation of radar data. Under a large scale background favorable for mesoscale convection, efficient perturbations of upstream mid-low level meridional wind and moisture are key factors for the assimilation and forecast. Uncertainty still exists for the forecast of this case due to its limited predictability. Both the difference of large scale initial fields and the difference of analysis obtained from En KF assimilation due to small amplitude of initial perturbations could have critical influences to the event's prediction. Forecast could be improved through more cycles of En KF assimilation. Sensitivity tests also support that more accurate forecasts are expected through improving numerical models and observations.展开更多
文摘Many business applications rely on their historical data to predict their business future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that help</span><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span><span style="font-family:Verdana;"> to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services </span><span style="font-family:Verdana;"><span style="font-family:Verdana;">is</span></span><span style="font-family:Verdana;"> depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of information. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business applications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are revised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the literature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Na<span style="white-space:nowrap;">ï</span>ve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
基金This project is supported by Ministry of Education, Culture, Sports, Science and Technology (MONBUSHO), Japan.
文摘Various kinds of data are used in new product design and more accurate datamake the design results more reliable. Even though part of product data can be available directlyfrom the existing similar products, there still leaves a great deal of data unavailable. This makesdata prediction a valuable work. A method that can predict data of product under development basedon the existing similar products is proposed. Fuzzy theory is used to deal with the uncertainties indata prediction process. The proposed method can be used in life cycle design, life cycleassessment (LCA) etc. Case study on current refrigerator is used as a demonstration example.
基金supported by the Basic Research Special Plan of Yunnan Provincial Department of Science and Technology-General Project(Grant No.202101AT070094)。
文摘The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.
基金support by the National Natural Science Foundation of China(Grant Nos.52108377,52090084,and 51938008).
文摘This research explores the potential for the evaluation and prediction of earth pressure balance shield performance based on a gray system model.The research focuses on a shield tunnel excavated for Metro Line 2 in Dalian,China.Due to the large error between the initial geological exploration data and real strata,the project construction is extremely difficult.In view of the current situation regarding the project,a quantitative method for evaluating the tunneling efficiency was proposed using cutterhead rotation(R),advance speed(S),total thrust(F)and torque(T).A total of 80 datasets with three input parameters and one output variable(F or T)were collected from this project,and a prediction framework based gray system model was established.Based on the prediction model,five prediction schemes were set up.Through error analysis,the optimal prediction scheme was obtained from the five schemes.The parametric investigation performed indicates that the relationships between F and the three input variables in the gray system model harmonize with the theoretical explanation.The case shows that the shield tunneling performance and efficiency are improved by the tunneling parameter prediction model based on the gray system model.
基金Project supported by the State Key Program of the National Natural Science of China (Grant No. 60835004)the Natural Science Foundation of Jiangsu Province of China (Grant No. BK2009727)+1 种基金the Natural Science Foundation of Higher Education Institutions of Jiangsu Province of China (Grant No. 10KJB510004)the National Natural Science Foundation of China (Grant No. 61075028)
文摘On the assumption that random interruptions in the observation process are modeled by a sequence of independent Bernoulli random variables, we firstly generalize two kinds of nonlinear filtering methods with random interruption failures in the observation based on the extended Kalman filtering (EKF) and the unscented Kalman filtering (UKF), which were shortened as GEKF and CUKF in this paper, respectively. Then the nonlinear filtering model is established by using the radial basis function neural network (RBFNN) prototypes and the network weights as state equation and the output of RBFNN to present the observation equation. Finally, we take the filtering problem under missing observed data as a special case of nonlinear filtering with random intermittent failures by setting each missing data to be zero without needing to pre-estimate the missing data, and use the GEKF-based RBFNN and the GUKF-based RBFNN to predict the ground radioactivity time series with missing data. Experimental results demonstrate that the prediction results of GUKF-based RBFNN accord well with the real ground radioactivity time series while the prediction results of GEKF-based RBFNN are divergent.
文摘Data is always a crucial issue of concern especially during its prediction and computation in digital revolution.This paper exactly helps in providing efficient learning mechanism for accurate predictability and reducing redundant data communication.It also discusses the Bayesian analysis that finds the conditional probability of at least two parametric based predictions for the data.The paper presents a method for improving the performance of Bayesian classification using the combination of Kalman Filter and K-means.The method is applied on a small dataset just for establishing the fact that the proposed algorithm can reduce the time for computing the clusters from data.The proposed Bayesian learning probabilistic model is used to check the statistical noise and other inaccuracies using unknown variables.This scenario is being implemented using efficient machine learning algorithm to perpetuate the Bayesian probabilistic approach.It also demonstrates the generative function forKalman-filer based prediction model and its observations.This paper implements the algorithm using open source platform of Python and efficiently integrates all different modules to piece of code via Common Platform Enumeration(CPE)for Python.
基金supported by the National Key Research and Development Program of China(No.2018YFB2101300)the National Natural Science Foundation of China(Grant No.61871186)the Dean’s Fund of Engineering Research Center of Software/Hardware Co-Design Technology and Application,Ministry of Education(East China Normal University).
文摘Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.
基金supported by the National Natural Science Foundation of China (41130530,91325301,41431177,41571212,41401237)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science,Chinese Academy of Sciences (ISSASIP1622)+1 种基金the Government Interest Related Program between Canadian Space Agency and Agriculture and Agri-Food,Canada (13MOA01002)the Natural Science Research Program of Jiangsu Province (14KJA170001)
文摘Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.
文摘The loess plateau covering the North Shaanxi slope and Tianhuan depression consists of a regional monocline, high in the east and low in the west, with dips of less than 1^0, Structural movement in this region was weak so that faults and local structures were not well developed. As a result, numerous wide and gentle noses and small traps with magnitudes less than 50 m were developed on the large westward-dipping monocline. Reservoirs, including Mesozoic oil reservoirs and Paleozoic gas reservoirs in the Ordos Basin, are dominantly lithologic with a small number of structural reservoirs. Single reservoirs are characterized as thin with large lateral variations, strong anisotropy, low porosity, low permeability, and low richness. A series of approaches for predicting reservoir thickness, physical properties, and hydrocarbon potential of subtle lithologic reservoirs was established based on the interpretation of erosion surfaces.
基金National Natural Science Foundation of China (No. 51174202)Doctoral Fund of Ministry of Education of China (No. 20100095110013)
文摘A model that rapidly predicts the density components of raw coal is described.It is based on a threegrade fast float/sink test.The recent comprehensive monthly floating and sinking data are used for comparison.The predicted data are used to draw washability curves and to provide a rapid evaluation of the effect from heavy medium induced separation.Thirty-one production shifts worth of fast float/sink data and the corresponding quick ash data are used to verify the model.The results show a small error with an arithmetic average of 0.53 and an absolute average error of 1.50.This indicates that this model has high precision.The theoretical yield from the washability curves is 76.47% for the monthly comprehensive data and 81.31% using the model data.This is for a desired cleaned coal ash of 9%.The relative error between these two is 6.33%,which is small and indicates that the predicted data can be used to rapidly evaluate the separation effect of gravity separation equipment.
基金the Beijing Chaoyang District Collaborative Innovation Project(No.CYXT2013)the subject support of Beijing Municipal Science and Technology Key R&D Program-Capital Blue Sky Action Cultivation Project(Z19110900910000)+1 种基金“Research and Demonstration ofHigh Emission Vehicle Monitoring Equipment System Based on Sensor Integration Technology”(Z19110000911003)This work was supported by the Academic Research Projects of Beijing Union University(No.ZK80202103).
文摘Time series forecasting and analysis are widely used in many fields and application scenarios.Time series historical data reflects the change pattern and trend,which can serve the application and decision in each application scenario to a certain extent.In this paper,we select the time series prediction problem in the atmospheric environment scenario to start the application research.In terms of data support,we obtain the data of nearly 3500 vehicles in some cities in China fromRunwoda Research Institute,focusing on the major pollutant emission data of non-road mobile machinery and high emission vehicles in Beijing and Bozhou,Anhui Province to build the dataset and conduct the time series prediction analysis experiments on them.This paper proposes a P-gLSTNet model,and uses Autoregressive Integrated Moving Average model(ARIMA),long and short-term memory(LSTM),and Prophet to predict and compare the emissions in the future period.The experiments are validated on four public data sets and one self-collected data set,and the mean absolute error(MAE),root mean square error(RMSE),and mean absolute percentage error(MAPE)are selected as the evaluationmetrics.The experimental results show that the proposed P-gLSTNet fusion model predicts less error,outperforms the backbone method,and is more suitable for the prediction of time-series data in this scenario.
文摘Accuracy of the fluid property data plays an absolutely pivotal role in the reservoir computational processes.Reliable data can be obtained through various experimental methods,but these methods are very expensive and time consuming.Alternative methods are numerical models.These methods used measured experimental data to develop a representative model for predicting desired parameters.In this study,to predict saturation pressure,oil formation volume factor,and solution gas oil ratio,several Artificial Intelligent(AI)models were developed.582 reported data sets were used as data bank that covers a wide range of fluid properties.Accuracy and reliability of the model was examined by some statistical parameters such as correlation coefficient(R2),average absolute relative deviation(AARD),and root mean square error(RMSE).The results illustrated good accordance between predicted data and target values.The model was also compared with previous works and developed empirical correlations which indicated that it is more reliable than all compared models and correlations.At the end,relevancy factor was calculated for each input parameters to illustrate the impact of different parameters on the predicted values.Relevancy factor showed that in these models,solution gas oil ratio has greatest impact on both saturation pressure and oil formation volume factor.In the other hand,saturation pressure has greatest effect on solution gas oil ratio.
文摘Various Wireless Sensor Network(WSN)applications require the common task of collecting the data from the sensor nodes using the sink.Since the procedure of collecting data is iterative,an effective technique is necessary to obtain the data efficiently by reducing the consumption of nodal energy.Hence,a technique for data reduction in WSN is presented in this paper by proposing a prediction algorithm,called Hierarchical Fractional Bidirectional Least-Mean Square(HFBLMS)algorithm.The novel algorithm is designed by modifying Hierarchical Least-Mean Square(HLMS)algorithm with the inclusion of BLMS for bidirectional-based data prediction and Fractional Calculus(FC)in the weight update process.Data redundancy is achieved by transmitting only those data required based on the data predicted at the sensor node and the sink.Moreover,the proposed HFBLMS algorithm reduces the energy consumption in the network by the effective prediction attained by BLMS.Two metrics,such as energy consumption and prediction error,are used for the evaluation of performance of the HFBLMS prediction algorithm,where it can attain energy values of 0.3587 and 0.1953 at the maximum number of rounds and prediction errors of just 0.0213 and 0.0095,using air quality and localization datasets,respectively.
基金supported by the National Key Research and Development Program of China (Grant No. 2017YFA0505500)Japan Society for the Promotion of Science KAKENHI Program (Grant No. JP15H05707)National Natural Science Foundation of China (Grant Nos. 11771010,31771476,91530320, 91529303,91439103 and 81471047)
文摘Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.
文摘Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed cases while the outbreak in other countries has led to their healthcare systems breakdown. In this work, we introduce an efficient framework called COMAP (COrona MAP), aiming to study and predict the behavior of COVID-19 based on deep learning techniques. COMAP consists of two stages: clustering and prediction. The first stage proposes a new algorithm called Co-means, allowing to group countries having similar behavior of COVID-19 into clusters. The second stage predicts the outbreak’s growth by introducing two adopted versions of LSTM and Prophet applied at country and continent scales. The simulations conducted on the data collected by WHO demonstrated the efficiency of COMAP in terms of returning accurate clustering and predictions.
文摘Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismic intensity attenuation based on 560 intensity data obtained in Iran in the time period 1975-2013. The geometric spreading and energy absorption of seismic waves have been considered in the proposed model. The proposed easy to implement relation describes the intensity simply as a function of moment magnitude, source to site distance and focal depth. The prediction capability of the proposed model is assessed by means of residuals analysis. Prediction results have been compared with those of other intensity prediction models for Italy, Turkey, Iran and central Asia. The results indicate the higher attenuation rate for the study area in distances less than 70 km.
基金supported by the National Natural Science Foundation of China(72101066,72131005,72121001,72171062,91846301,and 71772053)Heilongjiang Natural Science Excellent Youth Fund(YQ2022G004)Key Research and Development Projects of Heilongjiang Province(JD22A003).
文摘Numerical weather prediction(NWP)data possess internal inaccuracies,such as low NWP wind speed corresponding to high actual wind power generation.This study is intended to reduce the negative effects of such inaccuracies by proposing a pure data-selection framework(PDF)to choose useful data prior to modeling,thus improving the accuracy of day-ahead wind power forecasting.Briefly,we convert an entire NWP training dataset into many small subsets and then select the best subset combination via a validation set to build a forecasting model.Although a small subset can increase selection flexibility,it can also produce billions of subset combinations,resulting in computational issues.To address this problem,we incorporated metamodeling and optimization steps into PDF.We then proposed a design and analysis of the computer experiments-based metamodeling algorithm and heuristic-exhaustive search optimization algorithm,respectively.Experimental results demonstrate that(1)it is necessary to select data before constructing a forecasting model;(2)using a smaller subset will likely increase selection flexibility,leading to a more accurate forecasting model;(3)PDF can generate a better training dataset than similarity-based data selection methods(e.g.,K-means and support vector classification);and(4)choosing data before building a forecasting model produces a more accurate forecasting model compared with using a machine learning method to construct a model directly.
文摘The 21st Century era and new modern technologies surrounding us day-in and day-out have opened a new door to“Pandora Box”,that we do know it as AI(artificial intelligence)and its two essential integrated components namely ML(machine learning)and DL(deep learning).However,the strive and progress in AI,ML,and DL pretty much has taken over any industry that we can think of,when it comes to dealing with cloud of structured data in form of BD(big data).A NPP(nuclear power plant)has multiple complicated dynamic system-of-components that have nonlinear behaviors.For controlling the plant operation under both normal and abnormal conditions,the different systems in NPPs(e.g.,the reactor core components,primary and secondary coolant systems)are usually monitored continuously,which leads to very huge amounts of data.Of course Nuclear Power Industry in form of GEN-IV(Generation IV)has not been left behind in this 21st century era by moving out of GEN-III(Generation III)to more modulars form of GEN-IV,known as SMRs(small modular reactors),with a lot of electronic gadgets and electronics that read data and information from it to support safety of these reactor,while in operation with a built in PRA(probabilistic risk assessment),which requires augmentation of AI in them to enhance performance of human operators that are engaged with day-to-day smooth operation of these reactors to make them safe and safer as well as resilience against any natural or man-made disasters by obtaining information through ML from DL that is collecting massive stream of data coming via omni-direction.Integration of AI with HI(human intelligence)is not separable,when it comes to operation of these smart SMRs with state of the art and smart control rooms with human in them as actors.This TM(technical memorandum)is describing the necessity of AI playing with nuclear reactor power plant of GEN-IV being in operation within near term sooner than later,when specially we are facing today’s cyber-attacks with their smart malware agents at work.
文摘Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately. In this paper, we simulate an extreme precipitation event with ensemble Kalman filter(En KF) assimilation of Doppler radial-velocity observations, and analyze the uncertainties of the assimilation. The results demonstrate that, without assimilation radar data, neither a single initialization of deterministic forecast nor an ensemble forecast with adding perturbations or multiple physical parameterizations can predict the location of strong precipitation. However, forecast was significantly improved with assimilation of radar data, especially the location of the precipitation. The direct cause of the improvement is the buildup of a deep mesoscale convection system with En KF assimilation of radar data. Under a large scale background favorable for mesoscale convection, efficient perturbations of upstream mid-low level meridional wind and moisture are key factors for the assimilation and forecast. Uncertainty still exists for the forecast of this case due to its limited predictability. Both the difference of large scale initial fields and the difference of analysis obtained from En KF assimilation due to small amplitude of initial perturbations could have critical influences to the event's prediction. Forecast could be improved through more cycles of En KF assimilation. Sensitivity tests also support that more accurate forecasts are expected through improving numerical models and observations.