The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist...The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.展开更多
Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the g...Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the great potential to deal with pore pressure prediction.However,most of the traditional deep learning models are less efficient to address generalization problems.To fill this technical gap,in this work,we developed a new adaptive physics-informed deep learning model with high generalization capability to predict pore pressure values directly from seismic data.Specifically,the new model,named CGP-NN,consists of a novel parametric features extraction approach(1DCPP),a stacked multilayer gated recurrent model(multilayer GRU),and an adaptive physics-informed loss function.Through machine training,the developed model can automatically select the optimal physical model to constrain the results for each pore pressure prediction.The CGP-NN model has the best generalization when the physicsrelated metricλ=0.5.A hybrid approach combining Eaton and Bowers methods is also proposed to build machine-learnable labels for solving the problem of few labels.To validate the developed model and methodology,a case study on a complex reservoir in Tarim Basin was further performed to demonstrate the high accuracy on the pore pressure prediction of new wells along with the strong generalization ability.The adaptive physics-informed deep learning approach presented here has potential application in the prediction of pore pressures coupled with multiple genesis mechanisms using seismic data.展开更多
A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow acc...A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.展开更多
Maritime transportation,a cornerstone of global trade,faces increasing safety challenges due to growing sea traffic volumes.This study proposes a novel approach to vessel trajectory prediction utilizing Automatic Iden...Maritime transportation,a cornerstone of global trade,faces increasing safety challenges due to growing sea traffic volumes.This study proposes a novel approach to vessel trajectory prediction utilizing Automatic Identification System(AIS)data and advanced deep learning models,including Long Short-Term Memory(LSTM),Gated Recurrent Unit(GRU),Bidirectional LSTM(DBLSTM),Simple Recurrent Neural Network(SimpleRNN),and Kalman Filtering.The research implemented rigorous AIS data preprocessing,encompassing record deduplication,noise elimination,stationary simplification,and removal of insignificant trajectories.Models were trained using key navigational parameters:latitude,longitude,speed,and heading.Spatiotemporal aware processing through trajectory segmentation and topological data analysis(TDA)was employed to capture dynamic patterns.Validation using a three-month AIS dataset demonstrated significant improvements in prediction accuracy.The GRU model exhibited superior performance,achieving training losses of 0.0020(Mean Squared Error,MSE)and 0.0334(Mean Absolute Error,MAE),with validation losses of 0.0708(MSE)and 0.1720(MAE).The LSTM model showed comparable efficacy,with training losses of 0.0011(MSE)and 0.0258(MAE),and validation losses of 0.2290(MSE)and 0.2652(MAE).Both models demonstrated reductions in training and validation losses,measured by MAE,MSE,Average Displacement Error(ADE),and Final Displacement Error(FDE).This research underscores the potential of advanced deep learning models in enhancing maritime safety through more accurate trajectory predictions,contributing significantly to the development of robust,intelligent navigation systems for the maritime industry.展开更多
In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene p...In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.展开更多
The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO...The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.展开更多
Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean...Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean temperature prediction is based on data-driven,but research on this method is mostly limited to the sea surface,with few studies on the prediction of internal ocean temperature.Existing graph neural network-based methods usually use predefined graphs or learned static graphs,which cannot capture the dynamic associations among data.In this study,we propose a novel dynamic spatiotemporal graph neural network(DSTGN)to predict threedimensional ocean temperature(3D-OT),which combines static graph learning and dynamic graph learning to automatically mine two unknown dependencies between sequences based on the original 3D-OT data without prior knowledge.Temporal and spatial dependencies in the time series were then captured using temporal and graph convolutions.We also integrated dynamic graph learning,static graph learning,graph convolution,and temporal convolution into an end-to-end framework for 3D-OT prediction using time-series grid data.In this study,we conducted prediction experiments using high-resolution 3D-OT from the Copernicus global ocean physical reanalysis,with data covering the vertical variation of temperature from the sea surface to 1000 m below the sea surface.We compared five mainstream models that are commonly used for ocean temperature prediction,and the results showed that the method achieved the best prediction results at all prediction scales.展开更多
Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to pred...Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to predict the landslide runout but a fundamental problem remained is how to determine the reliable numerical parameters.This study proposes a framework to predict the runout of potential landslides through multi-source data collaboration and numerical analysis of historical landslide events.Specifically,for the historical landslide cases,the landslide-induced seismic signal,geophysical surveys,and possible in-situ drone/phone videos(multi-source data collaboration)can validate the numerical results in terms of landslide dynamics and deposit features and help calibrate the numerical(rheological)parameters.Subsequently,the calibrated numerical parameters can be used to numerically predict the runout of potential landslides in the region with a similar geological setting to the recorded events.Application of the runout prediction approach to the 2020 Jiashanying landslide in Guizhou,China gives reasonable results in comparison to the field observations.The numerical parameters are determined from the multi-source data collaboration analysis of a historical case in the region(2019 Shuicheng landslide).The proposed framework for landslide runout prediction can be of great utility for landslide risk assessment and disaster reduction in mountainous regions worldwide.展开更多
Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based...Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.展开更多
In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model...In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control.展开更多
The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of con...The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of contiguous composite sandbodies in this area makes it challenging to characterize connectivity changes with conventional seismic attributes.Aiming at the above problem in the Bohai A Oilfield,this study proposes a big data analysis method based on the Deep Forest algorithm to predict the sandbody connectivity.Firstly,by compiling the abundant exploration and development sandbodies data in the study area,typical sandbodies with reliable connectivity were selected.Then,sensitive seismic attribute were extracted to obtain training samples.Finally,based on the Deep Forest algorithm,mapping model between attribute combinations and sandbody connectivity was established through machine learning.This method achieves the first quantitative determination of the connectivity for continuous composite sandbodies in the Bohai Oilfield.Compared with conventional connectivity discrimination methods such as high-resolution processing and seismic attribute analysis,this method can combine the sandbody characteristics of the study area in the process of machine learning,and jointly judge connectivity by combining multiple seismic attributes.The study results show that this method has high accuracy and timeliness in predicting connectivity for continuous composite sandbodies.Applied to the Bohai A Oilfield,it successfully identified multiple sandbody connectivity relationships and provided strong support for the subsequent exploration potential assessment and well placement optimization.This method also provides a new idea and method for studying sandbody connectivity under similar complex geological conditions.展开更多
The traditional least squares support vector regression(LS-SVR)model,using cross validation to determine the regularization parameter and kernel parameter,is time-consuming.We propose a Bayesian evidence framework t...The traditional least squares support vector regression(LS-SVR)model,using cross validation to determine the regularization parameter and kernel parameter,is time-consuming.We propose a Bayesian evidence framework to infer the LS-SVR model parameters.Three levels Bayesian inferences are used to determine the model parameters,regularization hyper-parameters and tune the nuclear parameters by model comparison.On this basis,we established Bayesian LS-SVR time-series gas forecasting models and provide steps for the algorithm.The gas outburst data of a Hebi 10th mine working face is used to validate the model.The optimal embedding dimension and delay time of the time series were obtained by the smallest differential entropy method.Finally,within a MATLAB7.1 environment,we used actual coal gas data to compare the traditional LS-SVR and the Bayesian LS-SVR with LS-SVMlab1.5 Toolbox simulation.The results show that the Bayesian framework of an LS-SVR significantly improves the speed and accuracy of the forecast.展开更多
With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality pred...With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.展开更多
Floods are one of the most serious natural disasters that can cause huge societal and economic losses.Extensive research has been conducted on topics like flood monitoring,prediction,and loss estimation.In these resea...Floods are one of the most serious natural disasters that can cause huge societal and economic losses.Extensive research has been conducted on topics like flood monitoring,prediction,and loss estimation.In these research fields,flood velocity plays a crucial role and is an important factor that influences the reliability of the outcomes.Traditional methods rely on physical models for flood simulation and prediction and could generate accurate results but often take a long time.Deep learning technology has recently shown significant potential in the same field,especially in terms of efficiency,helping to overcome the time-consuming associated with traditional methods.This study explores the potential of deep learning models in predicting flood velocity.More specifically,we use a Multi-Layer Perceptron(MLP)model,a specific type of Artificial Neural Networks(ANNs),to predict the velocity in the test area of the Lundesokna River in Norway with diverse terrain conditions.Geographic data and flood velocity simulated based on the physical hydraulic model are used in the study for the pre-training,optimization,and testing of the MLP model.Our experiment indicates that the MLP model has the potential to predict flood velocity in diverse terrain conditions of the river with acceptable accuracy against simulated velocity results but with a significant decrease in training time and testing time.Meanwhile,we discuss the limitations for the improvement in future work.展开更多
A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the ...A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the 4D-Var data assimilation algorithm on ENSO analysis and prediction based on the ICM. The model error is assumed to arise only from the parameter uncertainty. The "observation" of the SST anomaly, which is sampled from a "truth" model simulation that takes default parameter values and has Gaussian noise added, is directly assimilated into the assimilation model with its parameters set erroneously. Results show that 4D-Var effectively reduces the error of ENSO analysis and therefore improves the prediction skill of ENSO events compared with the non-assimilation case. These results provide a promising way for the ICM to achieve better real-time ENSO prediction.展开更多
By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution a...By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution and intermediate spatial resolution, a remote sensing-based model for mapping winter wheat on the North China Plain was built through integration with Landsat images and land-use data. First, a phenological window, PBW was drawn from time-series MODIS data. Next, feature extraction was performed for the PBW to reduce feature dimension and enhance its information. Finally, a regression model was built to model the relationship of the phenological feature and the sample data. The amount of information of the PBW was evaluated and compared with that of the main peak (MP). The relative precision of the mapping reached up to 92% in comparison to the Landsat sample data, and ranged between 87 and 96% in comparison to the statistical data. These results were sufficient to satisfy the accuracy requirements for winter wheat mapping at a large scale. Moreover, the proposed method has the ability to obtain the distribution information for winter wheat in an earlier period than previous studies. This study could throw light on the monitoring of winter wheat in China by using unique phenological feature of winter wheat.展开更多
In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical...In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical seismic structure is closely related to oil and gas-bearing reservoir, so it is very useful for a geologist or a geophysicist to precisely interpret the oil-bearing layers from the seismic data. This technology can be applied to any exploration or production stage. The new method has been tested on a series of exploratory or development wells and proved to be reliable in China. Hydrocarbon-detection with this new method for 39 exploration wells on 25 structures indi- cates a success ratio of over 80 percent. The new method of hydrocarbon prediction can be applied for: (1) depositional environment of reservoirs with marine fades, delta, or non-marine fades (including fluvial facies, lacustrine fades); (2) sedimentary rocks of reservoirs that are non-marine clastic rocks and carbonate rock; and (3) burial depths range from 300 m to 7000 m, and the minimum thickness of these reservoirs is over 8 m (main frequency is about 50 Hz).展开更多
Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algor...Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.展开更多
Distributed denial-of-service(DDoS)is a rapidly growing problem with the fast development of the Internet.There are multitude DDoS detection approaches,however,three major problems about DDoS attack detection appear i...Distributed denial-of-service(DDoS)is a rapidly growing problem with the fast development of the Internet.There are multitude DDoS detection approaches,however,three major problems about DDoS attack detection appear in the big data environment.Firstly,to shorten the respond time of the DDoS attack detector;secondly,to reduce the required compute resources;lastly,to achieve a high detection rate with low false alarm rate.In the paper,we propose an abnormal network flow feature sequence prediction approach which could fit to be used as a DDoS attack detector in the big data environment and solve aforementioned problems.We define a network flow abnormal index as PDRA with the percentage of old IP addresses,the increment of the new IP addresses,the ratio of new IP addresses to the old IP addresses and average accessing rate of each new IP address.We design an IP address database using sequential storage model which has a constant time complexity.The autoregressive integrated moving average(ARIMA)trending prediction module will be started if and only if the number of continuous PDRA sequence value,which all exceed an PDRA abnormal threshold(PAT),reaches a certain preset threshold.And then calculate the probability that is the percentage of forecasting PDRA sequence value which exceed the PAT.Finally we identify the DDoS attack based on the abnormal probability of the forecasting PDRA sequence.Both theorem and experiment show that the method we proposed can effectively reduce the compute resources consumption,identify DDoS attack at its initial stage with higher detection rate and lower false alarm rate.展开更多
基金This work was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0016977,The Establishment Project of Industry-University Fusion District).
文摘The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.
基金funded by the National Natural Science Foundation of China(General Program:No.52074314,No.U19B6003-05)National Key Research and Development Program of China(2019YFA0708303-05)。
文摘Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the great potential to deal with pore pressure prediction.However,most of the traditional deep learning models are less efficient to address generalization problems.To fill this technical gap,in this work,we developed a new adaptive physics-informed deep learning model with high generalization capability to predict pore pressure values directly from seismic data.Specifically,the new model,named CGP-NN,consists of a novel parametric features extraction approach(1DCPP),a stacked multilayer gated recurrent model(multilayer GRU),and an adaptive physics-informed loss function.Through machine training,the developed model can automatically select the optimal physical model to constrain the results for each pore pressure prediction.The CGP-NN model has the best generalization when the physicsrelated metricλ=0.5.A hybrid approach combining Eaton and Bowers methods is also proposed to build machine-learnable labels for solving the problem of few labels.To validate the developed model and methodology,a case study on a complex reservoir in Tarim Basin was further performed to demonstrate the high accuracy on the pore pressure prediction of new wells along with the strong generalization ability.The adaptive physics-informed deep learning approach presented here has potential application in the prediction of pore pressures coupled with multiple genesis mechanisms using seismic data.
文摘A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.
基金the“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2021RIS-004)Institute of Information and Communications Technology Planning and Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155857,Artificial Intelligence Convergence Innovation Human Resources Development(Chungnam National University)).
文摘Maritime transportation,a cornerstone of global trade,faces increasing safety challenges due to growing sea traffic volumes.This study proposes a novel approach to vessel trajectory prediction utilizing Automatic Identification System(AIS)data and advanced deep learning models,including Long Short-Term Memory(LSTM),Gated Recurrent Unit(GRU),Bidirectional LSTM(DBLSTM),Simple Recurrent Neural Network(SimpleRNN),and Kalman Filtering.The research implemented rigorous AIS data preprocessing,encompassing record deduplication,noise elimination,stationary simplification,and removal of insignificant trajectories.Models were trained using key navigational parameters:latitude,longitude,speed,and heading.Spatiotemporal aware processing through trajectory segmentation and topological data analysis(TDA)was employed to capture dynamic patterns.Validation using a three-month AIS dataset demonstrated significant improvements in prediction accuracy.The GRU model exhibited superior performance,achieving training losses of 0.0020(Mean Squared Error,MSE)and 0.0334(Mean Absolute Error,MAE),with validation losses of 0.0708(MSE)and 0.1720(MAE).The LSTM model showed comparable efficacy,with training losses of 0.0011(MSE)and 0.0258(MAE),and validation losses of 0.2290(MSE)and 0.2652(MAE).Both models demonstrated reductions in training and validation losses,measured by MAE,MSE,Average Displacement Error(ADE),and Final Displacement Error(FDE).This research underscores the potential of advanced deep learning models in enhancing maritime safety through more accurate trajectory predictions,contributing significantly to the development of robust,intelligent navigation systems for the maritime industry.
基金supported by the National Natural Science Foundation of China(22178190)。
文摘In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.
基金financial support extended for this academic work by the Beijing Natural Science Foundation(Grant 2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(Grant LSL-2212).
文摘The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.
基金The National Key R&D Program of China under contract No.2021YFC3101603.
文摘Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean temperature prediction is based on data-driven,but research on this method is mostly limited to the sea surface,with few studies on the prediction of internal ocean temperature.Existing graph neural network-based methods usually use predefined graphs or learned static graphs,which cannot capture the dynamic associations among data.In this study,we propose a novel dynamic spatiotemporal graph neural network(DSTGN)to predict threedimensional ocean temperature(3D-OT),which combines static graph learning and dynamic graph learning to automatically mine two unknown dependencies between sequences based on the original 3D-OT data without prior knowledge.Temporal and spatial dependencies in the time series were then captured using temporal and graph convolutions.We also integrated dynamic graph learning,static graph learning,graph convolution,and temporal convolution into an end-to-end framework for 3D-OT prediction using time-series grid data.In this study,we conducted prediction experiments using high-resolution 3D-OT from the Copernicus global ocean physical reanalysis,with data covering the vertical variation of temperature from the sea surface to 1000 m below the sea surface.We compared five mainstream models that are commonly used for ocean temperature prediction,and the results showed that the method achieved the best prediction results at all prediction scales.
基金supported by the National Natural Science Foundation of China(41977215)。
文摘Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to predict the landslide runout but a fundamental problem remained is how to determine the reliable numerical parameters.This study proposes a framework to predict the runout of potential landslides through multi-source data collaboration and numerical analysis of historical landslide events.Specifically,for the historical landslide cases,the landslide-induced seismic signal,geophysical surveys,and possible in-situ drone/phone videos(multi-source data collaboration)can validate the numerical results in terms of landslide dynamics and deposit features and help calibrate the numerical(rheological)parameters.Subsequently,the calibrated numerical parameters can be used to numerically predict the runout of potential landslides in the region with a similar geological setting to the recorded events.Application of the runout prediction approach to the 2020 Jiashanying landslide in Guizhou,China gives reasonable results in comparison to the field observations.The numerical parameters are determined from the multi-source data collaboration analysis of a historical case in the region(2019 Shuicheng landslide).The proposed framework for landslide runout prediction can be of great utility for landslide risk assessment and disaster reduction in mountainous regions worldwide.
文摘Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.
文摘In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control.
文摘The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of contiguous composite sandbodies in this area makes it challenging to characterize connectivity changes with conventional seismic attributes.Aiming at the above problem in the Bohai A Oilfield,this study proposes a big data analysis method based on the Deep Forest algorithm to predict the sandbody connectivity.Firstly,by compiling the abundant exploration and development sandbodies data in the study area,typical sandbodies with reliable connectivity were selected.Then,sensitive seismic attribute were extracted to obtain training samples.Finally,based on the Deep Forest algorithm,mapping model between attribute combinations and sandbody connectivity was established through machine learning.This method achieves the first quantitative determination of the connectivity for continuous composite sandbodies in the Bohai Oilfield.Compared with conventional connectivity discrimination methods such as high-resolution processing and seismic attribute analysis,this method can combine the sandbody characteristics of the study area in the process of machine learning,and jointly judge connectivity by combining multiple seismic attributes.The study results show that this method has high accuracy and timeliness in predicting connectivity for continuous composite sandbodies.Applied to the Bohai A Oilfield,it successfully identified multiple sandbody connectivity relationships and provided strong support for the subsequent exploration potential assessment and well placement optimization.This method also provides a new idea and method for studying sandbody connectivity under similar complex geological conditions.
基金Financial support for this work,provided by the National Natural Science Foundation of China(No.60974126)the Natural Science Foundation of Jiangsu Province(No.BK2009094)
文摘The traditional least squares support vector regression(LS-SVR)model,using cross validation to determine the regularization parameter and kernel parameter,is time-consuming.We propose a Bayesian evidence framework to infer the LS-SVR model parameters.Three levels Bayesian inferences are used to determine the model parameters,regularization hyper-parameters and tune the nuclear parameters by model comparison.On this basis,we established Bayesian LS-SVR time-series gas forecasting models and provide steps for the algorithm.The gas outburst data of a Hebi 10th mine working face is used to validate the model.The optimal embedding dimension and delay time of the time series were obtained by the smallest differential entropy method.Finally,within a MATLAB7.1 environment,we used actual coal gas data to compare the traditional LS-SVR and the Bayesian LS-SVR with LS-SVMlab1.5 Toolbox simulation.The results show that the Bayesian framework of an LS-SVR significantly improves the speed and accuracy of the forecast.
基金supported by the National Science and Technology Innovation 2030 Next-Generation Artifical Intelligence Major Project(2018AAA0101801)the National Natural Science Foundation of China(72271188)。
文摘With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.
文摘Floods are one of the most serious natural disasters that can cause huge societal and economic losses.Extensive research has been conducted on topics like flood monitoring,prediction,and loss estimation.In these research fields,flood velocity plays a crucial role and is an important factor that influences the reliability of the outcomes.Traditional methods rely on physical models for flood simulation and prediction and could generate accurate results but often take a long time.Deep learning technology has recently shown significant potential in the same field,especially in terms of efficiency,helping to overcome the time-consuming associated with traditional methods.This study explores the potential of deep learning models in predicting flood velocity.More specifically,we use a Multi-Layer Perceptron(MLP)model,a specific type of Artificial Neural Networks(ANNs),to predict the velocity in the test area of the Lundesokna River in Norway with diverse terrain conditions.Geographic data and flood velocity simulated based on the physical hydraulic model are used in the study for the pre-training,optimization,and testing of the MLP model.Our experiment indicates that the MLP model has the potential to predict flood velocity in diverse terrain conditions of the river with acceptable accuracy against simulated velocity results but with a significant decrease in training time and testing time.Meanwhile,we discuss the limitations for the improvement in future work.
基金supported by the National Natural Science Foundation of China(Grant Nos.41490644,41475101 and 41421005)the CAS Strategic Priority Project(the Western Pacific Ocean System+2 种基金Project Nos.XDA11010105,XDA11020306 and XDA11010301)the NSFC-Shandong Joint Fund for Marine Science Research Centers(Grant No.U1406401)the NSFC Innovative Group Grant(Project No.41421005)
文摘A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the 4D-Var data assimilation algorithm on ENSO analysis and prediction based on the ICM. The model error is assumed to arise only from the parameter uncertainty. The "observation" of the SST anomaly, which is sampled from a "truth" model simulation that takes default parameter values and has Gaussian noise added, is directly assimilated into the assimilation model with its parameters set erroneously. Results show that 4D-Var effectively reduces the error of ENSO analysis and therefore improves the prediction skill of ENSO events compared with the non-assimilation case. These results provide a promising way for the ICM to achieve better real-time ENSO prediction.
基金supported by the open research fund of the Key Laboratory of Agri-informatics,Ministry of Agriculture and the fund of Outstanding Agricultural Researcher,Ministry of Agriculture,China
文摘By employing the unique phenological feature of winter wheat extracted from peak before winter (PBW) and the advantages of moderate resolution imaging spectroradiometer (MODIS) data with high temporal resolution and intermediate spatial resolution, a remote sensing-based model for mapping winter wheat on the North China Plain was built through integration with Landsat images and land-use data. First, a phenological window, PBW was drawn from time-series MODIS data. Next, feature extraction was performed for the PBW to reduce feature dimension and enhance its information. Finally, a regression model was built to model the relationship of the phenological feature and the sample data. The amount of information of the PBW was evaluated and compared with that of the main peak (MP). The relative precision of the mapping reached up to 92% in comparison to the Landsat sample data, and ranged between 87 and 96% in comparison to the statistical data. These results were sufficient to satisfy the accuracy requirements for winter wheat mapping at a large scale. Moreover, the proposed method has the ability to obtain the distribution information for winter wheat in an earlier period than previous studies. This study could throw light on the monitoring of winter wheat in China by using unique phenological feature of winter wheat.
基金Mainly presented at the 6-th international meeting of acoustics in Aug. 2003, and The 1999 SPE Asia Pacific Oil and GasConference and Exhibition held in Jakarta, Indonesia, 20-22 April 1999, SPE 54274.
文摘In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical seismic structure is closely related to oil and gas-bearing reservoir, so it is very useful for a geologist or a geophysicist to precisely interpret the oil-bearing layers from the seismic data. This technology can be applied to any exploration or production stage. The new method has been tested on a series of exploratory or development wells and proved to be reliable in China. Hydrocarbon-detection with this new method for 39 exploration wells on 25 structures indi- cates a success ratio of over 80 percent. The new method of hydrocarbon prediction can be applied for: (1) depositional environment of reservoirs with marine fades, delta, or non-marine fades (including fluvial facies, lacustrine fades); (2) sedimentary rocks of reservoirs that are non-marine clastic rocks and carbonate rock; and (3) burial depths range from 300 m to 7000 m, and the minimum thickness of these reservoirs is over 8 m (main frequency is about 50 Hz).
文摘Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data.
基金This work was supported by the National Natural Science Foundation of China[No.61762033,61363071,61702539]The National Natural Science Foundation of Hainan[No.617048,2018CXTD333]+1 种基金Hainan University Doctor Start Fund Project[No.kyqd1328]Hainan University Youth Fund Project[No.qnjj1444].
文摘Distributed denial-of-service(DDoS)is a rapidly growing problem with the fast development of the Internet.There are multitude DDoS detection approaches,however,three major problems about DDoS attack detection appear in the big data environment.Firstly,to shorten the respond time of the DDoS attack detector;secondly,to reduce the required compute resources;lastly,to achieve a high detection rate with low false alarm rate.In the paper,we propose an abnormal network flow feature sequence prediction approach which could fit to be used as a DDoS attack detector in the big data environment and solve aforementioned problems.We define a network flow abnormal index as PDRA with the percentage of old IP addresses,the increment of the new IP addresses,the ratio of new IP addresses to the old IP addresses and average accessing rate of each new IP address.We design an IP address database using sequential storage model which has a constant time complexity.The autoregressive integrated moving average(ARIMA)trending prediction module will be started if and only if the number of continuous PDRA sequence value,which all exceed an PDRA abnormal threshold(PAT),reaches a certain preset threshold.And then calculate the probability that is the percentage of forecasting PDRA sequence value which exceed the PAT.Finally we identify the DDoS attack based on the abnormal probability of the forecasting PDRA sequence.Both theorem and experiment show that the method we proposed can effectively reduce the compute resources consumption,identify DDoS attack at its initial stage with higher detection rate and lower false alarm rate.