Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs label...Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.展开更多
In the information era,the core business and confidential information of enterprises/organizations is stored in information systems.However,certain malicious inside network users exist hidden inside the organization;t...In the information era,the core business and confidential information of enterprises/organizations is stored in information systems.However,certain malicious inside network users exist hidden inside the organization;these users intentionally or unintentionally misuse the privileges of the organization to obtain sensitive information from the company.The existing approaches on insider threat detection mostly focus on monitoring,detecting,and preventing any malicious behavior generated by users within an organization’s system while ignoring the imbalanced ground-truth insider threat data impact on security.To this end,to be able to detect insider threats more effectively,a data processing tool was developed to process the detected user activity to generate information-use events,and formulated a Data Adjustment(DA)strategy to adjust the weight of the minority and majority samples.Then,an efficient ensemble strategy was utilized,which applied the extreme gradient boosting(XGBoost)model combined with the DA strategy to detect anomalous behavior.The CERT dataset was used for an insider threat to evaluate our approach,which was a real-world dataset with artificially injected insider threat events.The results demonstrated that the proposed approach can effectively detect insider threats,with an accuracy rate of 99.51%and an average recall rate of 98.16%.Compared with other classifiers,the detection performance is improved by 8.76%.展开更多
In order to increase the accuracy of turbulence field reconstruction,this paper combines experimental observation and numerical simulation to develop and establish a data assimilation framework,and apply it to the stu...In order to increase the accuracy of turbulence field reconstruction,this paper combines experimental observation and numerical simulation to develop and establish a data assimilation framework,and apply it to the study of S809 low-speed and high-angle airfoil flow.The method is based on the ensemble transform Kalman filter(ETKF)algorithm,which improves the disturbance strategy of the ensemble members and enhances the richness of the initial members by screening high flow field sensitivity constants,increasing the constant disturbance dimensions and designing a fine disturbance interval.The results show that the pressure distribution on the airfoil surface after assimilation is closer to the experimental value than that of the standard Spalart-Allmaras(S-A)model.The separated vortex estimated by filtering is fuller,and the eddy viscosity field information is more abundant,which is physically consistent with the observation information.Therefore,the data assimilation method based on the improved ensemble strategy can more accurately and effectively describe complex turbulence phenomena.展开更多
The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important pa...The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important part of the construction of agro-meteorological index system, but also an important part of the meteorological service system. In this paper, by analyzing local meteorological data and phenological data of “Red Fuji” apples in Fen County, Linfen City, Shanxi Province, with the help of machine learning and neural networks, we proposed a method based on the combination of time series forecasting and classification forecasting is proposed to complete the dynamic forecasting model of local flowering in Ji County. Then, we evaluated the effectiveness of the model based on the number of error days and the number of days in advance. The implementation shows that the proposed multivariable LSTM network has a good effect on the prediction of meteorological factors. The model loss is less than 0.2. In the two-category task of flowering judgment, the idea of combining strategies in ensemble learning improves the effect of flowering judgment, and its AUC value increases from 0.81 and 0.80 of single model RF and AdaBoost to 0.82. The proposed model has high applicability and accuracy for flowering forecast. At the same time, the model solves the problem of rounding decimals in the prediction of flowering dates by the regression method.展开更多
基金financially supported by the National Natural Science Foundation of China(Grant No.42002134)China Postdoctoral Science Foundation(Grant No.2021T140735)Science Foundation of China University of Petroleum,Beijing(Grant Nos.2462020XKJS02 and 2462020YXZZ004).
文摘Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.
基金This work was financially supported by“the National Key R&D Program of China”(No.2018YFB0803602)exploration and practice on the education mode for engineering students based on technology,literature and art interdisciplinary integration with the Internet+background(No.022150118004/001)。
文摘In the information era,the core business and confidential information of enterprises/organizations is stored in information systems.However,certain malicious inside network users exist hidden inside the organization;these users intentionally or unintentionally misuse the privileges of the organization to obtain sensitive information from the company.The existing approaches on insider threat detection mostly focus on monitoring,detecting,and preventing any malicious behavior generated by users within an organization’s system while ignoring the imbalanced ground-truth insider threat data impact on security.To this end,to be able to detect insider threats more effectively,a data processing tool was developed to process the detected user activity to generate information-use events,and formulated a Data Adjustment(DA)strategy to adjust the weight of the minority and majority samples.Then,an efficient ensemble strategy was utilized,which applied the extreme gradient boosting(XGBoost)model combined with the DA strategy to detect anomalous behavior.The CERT dataset was used for an insider threat to evaluate our approach,which was a real-world dataset with artificially injected insider threat events.The results demonstrated that the proposed approach can effectively detect insider threats,with an accuracy rate of 99.51%and an average recall rate of 98.16%.Compared with other classifiers,the detection performance is improved by 8.76%.
基金Project supported by the Foundation of National Key Laboratory of Science and Technology on Aerodynamic Design and Research of China(No.614220119040101)the National Natural Science Foundation of China(No.91852115)。
文摘In order to increase the accuracy of turbulence field reconstruction,this paper combines experimental observation and numerical simulation to develop and establish a data assimilation framework,and apply it to the study of S809 low-speed and high-angle airfoil flow.The method is based on the ensemble transform Kalman filter(ETKF)algorithm,which improves the disturbance strategy of the ensemble members and enhances the richness of the initial members by screening high flow field sensitivity constants,increasing the constant disturbance dimensions and designing a fine disturbance interval.The results show that the pressure distribution on the airfoil surface after assimilation is closer to the experimental value than that of the standard Spalart-Allmaras(S-A)model.The separated vortex estimated by filtering is fuller,and the eddy viscosity field information is more abundant,which is physically consistent with the observation information.Therefore,the data assimilation method based on the improved ensemble strategy can more accurately and effectively describe complex turbulence phenomena.
文摘The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important part of the construction of agro-meteorological index system, but also an important part of the meteorological service system. In this paper, by analyzing local meteorological data and phenological data of “Red Fuji” apples in Fen County, Linfen City, Shanxi Province, with the help of machine learning and neural networks, we proposed a method based on the combination of time series forecasting and classification forecasting is proposed to complete the dynamic forecasting model of local flowering in Ji County. Then, we evaluated the effectiveness of the model based on the number of error days and the number of days in advance. The implementation shows that the proposed multivariable LSTM network has a good effect on the prediction of meteorological factors. The model loss is less than 0.2. In the two-category task of flowering judgment, the idea of combining strategies in ensemble learning improves the effect of flowering judgment, and its AUC value increases from 0.81 and 0.80 of single model RF and AdaBoost to 0.82. The proposed model has high applicability and accuracy for flowering forecast. At the same time, the model solves the problem of rounding decimals in the prediction of flowering dates by the regression method.