Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
The manuscript presents an augmented Lagrangian—fast projected gradient method (ALFPGM) with an improved scheme of working set selection, pWSS, a decomposition based algorithm for training support vector classificati...The manuscript presents an augmented Lagrangian—fast projected gradient method (ALFPGM) with an improved scheme of working set selection, pWSS, a decomposition based algorithm for training support vector classification machines (SVM). The manuscript describes the ALFPGM algorithm, provides numerical results for training SVM on large data sets, and compares the training times of ALFPGM and Sequential Minimal Minimization algorithms (SMO) from Scikit-learn library. The numerical results demonstrate that ALFPGM with the improved working selection scheme is capable of training SVM with tens of thousands of training examples in a fraction of the training time of some widely adopted SVM tools.展开更多
Geophysical data sets are growing at an ever-increasing rate,requiring computationally efficient data selection (thinning) methods to preserve essential information.Satellites,such as WindSat,provide large data sets...Geophysical data sets are growing at an ever-increasing rate,requiring computationally efficient data selection (thinning) methods to preserve essential information.Satellites,such as WindSat,provide large data sets for assessing the accuracy and computational efficiency of data selection techniques.A new data thinning technique,based on support vector regression (SVR),is developed and tested.To manage large on-line satellite data streams,observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR).Three experiments are performed.The first confirms the viability of TSVR for a relatively small sample,comparing it to several commonly used data thinning methods (random selection,averaging and Barnes filtering),producing a 10% thinning rate (90% data reduction),low mean absolute errors (MAE) and large correlations with the original data.A second experiment,using a larger dataset,shows TSVR retrievals with MAE < 1 m s-1 and correlations ≥ 0.98.TSVR was an order of magnitude faster than the commonly used thinning methods.A third experiment applies a two-stage pipeline to TSVR,to accommodate online data.The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment,is an order of magnitude faster than the nonpipeline TSVR.Therefore,pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set.This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.展开更多
Temperature prediction plays an important role in ring die granulator control,which can influence the quantity and quality of production. Temperature prediction modeling is a complicated problem with its MIMO, nonline...Temperature prediction plays an important role in ring die granulator control,which can influence the quantity and quality of production. Temperature prediction modeling is a complicated problem with its MIMO, nonlinear, and large time-delay characteristics. Support vector machine( SVM) has been successfully based on small data. But its accuracy is not high,in contrast,if the number of data and dimension of feature increase,the training time of model will increase dramatically. In this paper,a linear SVM was applied combing with cyclic coordinate descent( CCD) to solving big data regression. It was mathematically strictly proved and validated by simulation. Meanwhile,real data were conducted to prove the linear SVM model's effect. Compared with other methods for big data in simulation, this algorithm has apparent advantage not only in fast modeling but also in high fitness.展开更多
The rainstorm is believed to contribute flood disasters in upstream catchments,resulting in further consequences in downstream area due to rise of river water levels.Forecasting for flood water level has been challeng...The rainstorm is believed to contribute flood disasters in upstream catchments,resulting in further consequences in downstream area due to rise of river water levels.Forecasting for flood water level has been challenging,present-ing complex task due to its nonlinearities and dependencies.This study proposes a support vector machine regression model,regarded as a powerful machine learning-based technique to forecast flood water levels in downstream area for different lead times.As a case study,Kelantan River in Malaysia has been selected to validate the proposed model.Four water level stations in river basin upstream were identified as input variables.A river water level in downstream area was selected as output of flood forecasting model.A comparison with several bench-marking models,including radial basis function(RBF)and nonlinear autoregres-sive with exogenous input(NARX)neural network was performed.The results demonstrated that in terms of RMSE error,NARX model was better for the proposed models.However,support vector regression(SVR)demonstrated a more consistent performance,indicated by the highest coefficient of determination value in twelve-hour period ahead of forecasting time.The findings of this study signified that SVR was more capable of addressing the long-term flood forecasting problems.展开更多
With the development of automation and informatization in the steelmaking industry,the human brain gradually fails to cope with an increasing amount of data generated during the steelmaking process.Machine learning te...With the development of automation and informatization in the steelmaking industry,the human brain gradually fails to cope with an increasing amount of data generated during the steelmaking process.Machine learning technology provides a new method other than production experience and metallurgical principles in dealing with large amounts of data.The application of machine learning in the steelmaking process has become a research hotspot in recent years.This paper provides an overview of the applications of machine learning in the steelmaking process modeling involving hot metal pretreatment,primary steelmaking,secondary refining,and some other aspects.The three most frequently used machine learning algorithms in steelmaking process modeling are the artificial neural network,support vector machine,and case-based reasoning,demonstrating proportions of 56%,14%,and 10%,respectively.Collected data in the steelmaking plants are frequently faulty.Thus,data processing,especially data cleaning,is crucially important to the performance of machine learning models.The detection of variable importance can be used to optimize the process parameters and guide production.Machine learning is used in hot metal pretreatment modeling mainly for endpoint S content prediction.The predictions of the endpoints of element compositions and the process parameters are widely investigated in primary steelmaking.Machine learning is used in secondary refining modeling mainly for ladle furnaces,Ruhrstahl–Heraeus,vacuum degassing,argon oxygen decarburization,and vacuum oxygen decarburization processes.Further development of machine learning in the steelmaking process modeling can be realized through additional efforts in the construction of the data platform,the industrial transformation of the research achievements to the practical steelmaking process,and the improvement of the universality of the machine learning models.展开更多
Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learni...Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.展开更多
Conventional machine learning(CML)methods have been successfully applied for gas reservoir prediction.Their prediction accuracy largely depends on the quality of the sample data;therefore,feature optimization of the i...Conventional machine learning(CML)methods have been successfully applied for gas reservoir prediction.Their prediction accuracy largely depends on the quality of the sample data;therefore,feature optimization of the input samples is particularly important.Commonly used feature optimization methods increase the interpretability of gas reservoirs;however,their steps are cumbersome,and the selected features cannot sufficiently guide CML models to mine the intrinsic features of sample data efficiently.In contrast to CML methods,deep learning(DL)methods can directly extract the important features of targets from raw data.Therefore,this study proposes a feature optimization and gas-bearing prediction method based on a hybrid fusion model that combines a convolutional neural network(CNN)and an adaptive particle swarm optimization-least squares support vector machine(APSO-LSSVM).This model adopts an end-to-end algorithm structure to directly extract features from sensitive multicomponent seismic attributes,considerably simplifying the feature optimization.A CNN was used for feature optimization to highlight sensitive gas reservoir information.APSO-LSSVM was used to fully learn the relationship between the features extracted by the CNN to obtain the prediction results.The constructed hybrid fusion model improves gas-bearing prediction accuracy through two processes of feature optimization and intelligent prediction,giving full play to the advantages of DL and CML methods.The prediction results obtained are better than those of a single CNN model or APSO-LSSVM model.In the feature optimization process of multicomponent seismic attribute data,CNN has demonstrated better gas reservoir feature extraction capabilities than commonly used attribute optimization methods.In the prediction process,the APSO-LSSVM model can learn the gas reservoir characteristics better than the LSSVM model and has a higher prediction accuracy.The constructed CNN-APSO-LSSVM model had lower errors and a better fit on the test dataset than the other individual models.This method proves the effectiveness of DL technology for the feature extraction of gas reservoirs and provides a feasible way to combine DL and CML technologies to predict gas reservoirs.展开更多
In the global scenario one of the important goals for sustainable development in industrial field is innovate new technology,and invest in building infrastructure.All the developed and developing countries focus on bu...In the global scenario one of the important goals for sustainable development in industrial field is innovate new technology,and invest in building infrastructure.All the developed and developing countries focus on building resilient infrastructure and promote sustainable developments by fostering innovation.At this juncture the cloud computing has become an important information and communication technologies model influencing sustainable development of the industries in the developing countries.As part of the innovations happening in the industrial sector,a new concept termed as‘smart manufacturing’has emerged,which employs the benefits of emerging technologies like internet of things and cloud computing.Cloud services deliver an on-demand access to computing,storage,and infrastructural platforms for the industrial users through Internet.In the recent era of information technology the number of business and individual users of cloud services have been increased and larger volumes of data is being processed and stored in it.As a consequence,the data breaches in the cloud services are also increasing day by day.Due to various security vulnerabilities in the cloud architecture;as a result the cloud environment has become non-resilient.To restore the normal behavior of the cloud,detect the deviations,and achieve higher resilience,anomaly detection becomes essential.The deep learning architectures-based anomaly detection mechanisms uses various monitoring metrics characterize the normal behavior of cloud services and identify the abnormal events.This paper focuses on designing an intelligent deep learning based approach for detecting cloud anomalies in real time to make it more resilient.The deep learning models are trained using features extracted from the system level and network level performance metrics observed in the Transfer Control Protocol(TCP)traces of the simulation.The experimental results of the proposed approach demonstrate a superior performance in terms of higher detection rate and lower false alarm rate when compared to the Support Vector Machine(SVM).展开更多
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
文摘The manuscript presents an augmented Lagrangian—fast projected gradient method (ALFPGM) with an improved scheme of working set selection, pWSS, a decomposition based algorithm for training support vector classification machines (SVM). The manuscript describes the ALFPGM algorithm, provides numerical results for training SVM on large data sets, and compares the training times of ALFPGM and Sequential Minimal Minimization algorithms (SMO) from Scikit-learn library. The numerical results demonstrate that ALFPGM with the improved working selection scheme is capable of training SVM with tens of thousands of training examples in a fraction of the training time of some widely adopted SVM tools.
基金NOAA Grant NA17RJ1227 and NSF Grant EIA-0205628 for providing financial support for this worksupported by RSF Grant 14-41-00039
文摘Geophysical data sets are growing at an ever-increasing rate,requiring computationally efficient data selection (thinning) methods to preserve essential information.Satellites,such as WindSat,provide large data sets for assessing the accuracy and computational efficiency of data selection techniques.A new data thinning technique,based on support vector regression (SVR),is developed and tested.To manage large on-line satellite data streams,observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR).Three experiments are performed.The first confirms the viability of TSVR for a relatively small sample,comparing it to several commonly used data thinning methods (random selection,averaging and Barnes filtering),producing a 10% thinning rate (90% data reduction),low mean absolute errors (MAE) and large correlations with the original data.A second experiment,using a larger dataset,shows TSVR retrievals with MAE < 1 m s-1 and correlations ≥ 0.98.TSVR was an order of magnitude faster than the commonly used thinning methods.A third experiment applies a two-stage pipeline to TSVR,to accommodate online data.The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment,is an order of magnitude faster than the nonpipeline TSVR.Therefore,pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set.This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
基金Nantong Research Program of Application Foundation,China(No.BK2012030)Key Project of Science and Technology Commission of Shanghai Municipality,China(No.10JC1405000)
文摘Temperature prediction plays an important role in ring die granulator control,which can influence the quantity and quality of production. Temperature prediction modeling is a complicated problem with its MIMO, nonlinear, and large time-delay characteristics. Support vector machine( SVM) has been successfully based on small data. But its accuracy is not high,in contrast,if the number of data and dimension of feature increase,the training time of model will increase dramatically. In this paper,a linear SVM was applied combing with cyclic coordinate descent( CCD) to solving big data regression. It was mathematically strictly proved and validated by simulation. Meanwhile,real data were conducted to prove the linear SVM model's effect. Compared with other methods for big data in simulation, this algorithm has apparent advantage not only in fast modeling but also in high fitness.
基金This study is carried out using the Japan-ASEAN Integration Fund(JAIF)with reference number of UTM.K43/11.21/1/12(264)Malaysia-Japan International Institute of Technology,Universiti Teknologi Malaysia.
文摘The rainstorm is believed to contribute flood disasters in upstream catchments,resulting in further consequences in downstream area due to rise of river water levels.Forecasting for flood water level has been challenging,present-ing complex task due to its nonlinearities and dependencies.This study proposes a support vector machine regression model,regarded as a powerful machine learning-based technique to forecast flood water levels in downstream area for different lead times.As a case study,Kelantan River in Malaysia has been selected to validate the proposed model.Four water level stations in river basin upstream were identified as input variables.A river water level in downstream area was selected as output of flood forecasting model.A comparison with several bench-marking models,including radial basis function(RBF)and nonlinear autoregres-sive with exogenous input(NARX)neural network was performed.The results demonstrated that in terms of RMSE error,NARX model was better for the proposed models.However,support vector regression(SVR)demonstrated a more consistent performance,indicated by the highest coefficient of determination value in twelve-hour period ahead of forecasting time.The findings of this study signified that SVR was more capable of addressing the long-term flood forecasting problems.
基金supported by the National Natural Science Foundation of China(No.U1960202)。
文摘With the development of automation and informatization in the steelmaking industry,the human brain gradually fails to cope with an increasing amount of data generated during the steelmaking process.Machine learning technology provides a new method other than production experience and metallurgical principles in dealing with large amounts of data.The application of machine learning in the steelmaking process has become a research hotspot in recent years.This paper provides an overview of the applications of machine learning in the steelmaking process modeling involving hot metal pretreatment,primary steelmaking,secondary refining,and some other aspects.The three most frequently used machine learning algorithms in steelmaking process modeling are the artificial neural network,support vector machine,and case-based reasoning,demonstrating proportions of 56%,14%,and 10%,respectively.Collected data in the steelmaking plants are frequently faulty.Thus,data processing,especially data cleaning,is crucially important to the performance of machine learning models.The detection of variable importance can be used to optimize the process parameters and guide production.Machine learning is used in hot metal pretreatment modeling mainly for endpoint S content prediction.The predictions of the endpoints of element compositions and the process parameters are widely investigated in primary steelmaking.Machine learning is used in secondary refining modeling mainly for ladle furnaces,Ruhrstahl–Heraeus,vacuum degassing,argon oxygen decarburization,and vacuum oxygen decarburization processes.Further development of machine learning in the steelmaking process modeling can be realized through additional efforts in the construction of the data platform,the industrial transformation of the research achievements to the practical steelmaking process,and the improvement of the universality of the machine learning models.
基金Taif University Researchers Supporting Project Number(TURSP-2020/73)Taif University,Taif,Saudi Arabia.
文摘Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.
基金funded by the Natural Science Foundation of Shandong Province (ZR2021MD061ZR2023QD025)+3 种基金China Postdoctoral Science Foundation (2022M721972)National Natural Science Foundation of China (41174098)Young Talents Foundation of Inner Mongolia University (10000-23112101/055)Qingdao Postdoctoral Science Foundation (QDBSH20230102094)。
文摘Conventional machine learning(CML)methods have been successfully applied for gas reservoir prediction.Their prediction accuracy largely depends on the quality of the sample data;therefore,feature optimization of the input samples is particularly important.Commonly used feature optimization methods increase the interpretability of gas reservoirs;however,their steps are cumbersome,and the selected features cannot sufficiently guide CML models to mine the intrinsic features of sample data efficiently.In contrast to CML methods,deep learning(DL)methods can directly extract the important features of targets from raw data.Therefore,this study proposes a feature optimization and gas-bearing prediction method based on a hybrid fusion model that combines a convolutional neural network(CNN)and an adaptive particle swarm optimization-least squares support vector machine(APSO-LSSVM).This model adopts an end-to-end algorithm structure to directly extract features from sensitive multicomponent seismic attributes,considerably simplifying the feature optimization.A CNN was used for feature optimization to highlight sensitive gas reservoir information.APSO-LSSVM was used to fully learn the relationship between the features extracted by the CNN to obtain the prediction results.The constructed hybrid fusion model improves gas-bearing prediction accuracy through two processes of feature optimization and intelligent prediction,giving full play to the advantages of DL and CML methods.The prediction results obtained are better than those of a single CNN model or APSO-LSSVM model.In the feature optimization process of multicomponent seismic attribute data,CNN has demonstrated better gas reservoir feature extraction capabilities than commonly used attribute optimization methods.In the prediction process,the APSO-LSSVM model can learn the gas reservoir characteristics better than the LSSVM model and has a higher prediction accuracy.The constructed CNN-APSO-LSSVM model had lower errors and a better fit on the test dataset than the other individual models.This method proves the effectiveness of DL technology for the feature extraction of gas reservoirs and provides a feasible way to combine DL and CML technologies to predict gas reservoirs.
文摘In the global scenario one of the important goals for sustainable development in industrial field is innovate new technology,and invest in building infrastructure.All the developed and developing countries focus on building resilient infrastructure and promote sustainable developments by fostering innovation.At this juncture the cloud computing has become an important information and communication technologies model influencing sustainable development of the industries in the developing countries.As part of the innovations happening in the industrial sector,a new concept termed as‘smart manufacturing’has emerged,which employs the benefits of emerging technologies like internet of things and cloud computing.Cloud services deliver an on-demand access to computing,storage,and infrastructural platforms for the industrial users through Internet.In the recent era of information technology the number of business and individual users of cloud services have been increased and larger volumes of data is being processed and stored in it.As a consequence,the data breaches in the cloud services are also increasing day by day.Due to various security vulnerabilities in the cloud architecture;as a result the cloud environment has become non-resilient.To restore the normal behavior of the cloud,detect the deviations,and achieve higher resilience,anomaly detection becomes essential.The deep learning architectures-based anomaly detection mechanisms uses various monitoring metrics characterize the normal behavior of cloud services and identify the abnormal events.This paper focuses on designing an intelligent deep learning based approach for detecting cloud anomalies in real time to make it more resilient.The deep learning models are trained using features extracted from the system level and network level performance metrics observed in the Transfer Control Protocol(TCP)traces of the simulation.The experimental results of the proposed approach demonstrate a superior performance in terms of higher detection rate and lower false alarm rate when compared to the Support Vector Machine(SVM).