Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for st...Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.展开更多
The North China Plain and the agricultural region are crossed by the Shanxi-Beijing natural gas pipeline.Resi-dents in the area use rototillers for planting and harvesting;however,the depth of the rototillers into the...The North China Plain and the agricultural region are crossed by the Shanxi-Beijing natural gas pipeline.Resi-dents in the area use rototillers for planting and harvesting;however,the depth of the rototillers into the ground is greater than the depth of the pipeline,posing a significant threat to the safe operation of the pipeline.Therefore,it is of great significance to study the dynamic response of rotary tillers impacting pipelines to ensure the safe opera-tion of pipelines.This article focuses on the Shanxi-Beijing natural gas pipeline,utilizingfinite element simulation software to establish afinite element model for the interaction among the machinery,pipeline,and soil,and ana-lyzing the dynamic response of the pipeline.At the same time,a decision tree model is introduced to classify the damage of pipelines under different working conditions,and the boundary value and importance of each influen-cing factor on pipeline damage are derived.Considering the actual conditions in the hemp yam planting area,targeted management measures have been proposed to ensure the operational safety of the Shanxi-Beijing natural gas pipeline in this region.展开更多
In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence ...In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.展开更多
AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with d...AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein, haptoglobin, α2 macroglobulin, and γ-glutamyl transpeptidase were used as predictors, and the FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of FO and F4 were classified with very high accuracy (18/20 for FO, 9/9 for FO-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in FO and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression,展开更多
Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can ...Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a...The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.展开更多
In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy sampl...In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as ...Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.展开更多
This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN)...This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.展开更多
Aiming at the problems of multiple types of power quality composite disturbances,strong feature correlation and high recognition error rate,a method of power quality composite disturbances identification based on mult...Aiming at the problems of multiple types of power quality composite disturbances,strong feature correlation and high recognition error rate,a method of power quality composite disturbances identification based on multiresolution S-transform and decision tree was proposed.Firstly,according to IEEE standard,the signal models of seven single power quality disturbances and 17 combined power quality disturbances are given,and the disturbance waveform samples are generated in batches.Then,in order to improve the recognition accuracy,the adjustment factor is introduced to obtain the controllable time-frequency resolution through multi-resolution S-transform time-frequency domain analysis.On this basis,five disturbance time-frequency domain features are extracted,which quantitatively reflect the characteristics of the analyzed power quality disturbance signal,which is less than the traditional method based on S-transform.Finally,three classifiers such as K-nearest neighbor,support vector machine and decision tree algorithm are used to effectively complete the identification of power quality composite disturbances.Simulation results showthat the classification accuracy of decision tree algorithmis higher than that of K-nearest neighbor and support vector machine.Finally,the proposed method is compared with other commonly used recognition algorithms.Experimental results show that the proposedmethod is effective in terms of detection accuracy,especially for combined PQ interference.展开更多
[Objective] The aim was to explore the feasibility of using single spectrum image to classify crops based on multi-spectral image and Decision Tree Method. [Method] Taking the typical agriculture plantation area in Hu...[Objective] The aim was to explore the feasibility of using single spectrum image to classify crops based on multi-spectral image and Decision Tree Method. [Method] Taking the typical agriculture plantation area in Hulunbeier area, according to field measured spectrum data, the optimum time of main crops, barley, wheat, rapeseed, based on crops spectrum characteristics, by dint of decision-making tree method, and considering spectral matching method, classification of crops was studied such as SAM. [Result] By dint of Landsat TM image gained in the first half of August, based on geographic and atmospheric proof-reading, decision-making tree was constructed. Plantation information about wheat, barley, and rapeseed and plantation grassland was extracted successfully. The general classification accuracy reached 86.90%. Kappa coefficient was 0.831 1. [Conclusion] Taking typical spectrum image as data source, and applying Decision Tree Method to get crops type's information had fine application future.展开更多
With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information...With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information about settlements in western Jilin Province, and the manually-extracted information about settlements in western Jilin Province was evaluated by confusion matrix. The results showed that Decision Tree Model was convenient for extracting settlements information by integrating spectral and texture features, and the accuracy of such a method was higher than that of the traditional Maximum Liklihood Method, in addition, calculation methods of extracting settlements information by this mean were concluded.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter da...Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter data could be used to estimate sea ice freeboard and provide alternative Antarctic sea ice thickness information with a high precision and long time series,as other radar altimetry satellites can,needs further investigation.This paper proposed an algorithm to discriminate leads and then retrieve sea ice freeboard and thickness from HY-2B radar altimeter data.We first collected the Moderate-resolution Imaging Spectroradiometer ice surface temperature(IST)product from the National Aeronautics and Space Administration to extract leads from the Antarctic waters and verified their accuracy through Sentinel-1 Synthetic Aperture Radar images.Second,a surface classification decision tree was generated for HY-2B satellite altimeter measurements of the Antarctic waters to extract leads and calculate local sea surface heights.We then estimated the Antarctic sea ice freeboard and thickness based on local sea surface heights and the static equilibrium equation.Finally,the retrieved HY-2B Antarctic sea ice thickness was compared with the CryoSat-2 sea ice thickness and the Antarctic Sea Ice Processes and Climate(ASPeCt)ship-based observed sea ice thickness.The results indicate that our classification decision tree constructed for HY-2B satellite altimeter measurements was reasonable,and the root mean square error of the obtained sea ice thickness compared to the ship measurements was 0.62 m.The proposed sea ice thickness algorithm for the HY-2B radar satellite fills a gap in this application domain for the HY-series satellites and can be a complement to existing Antarctic sea ice thickness products;this algorithm could provide long-time-series and large-scale sea ice thickness data that contribute to research on global climate change.展开更多
Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment redu...Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.展开更多
This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted av...This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.展开更多
基金supported by the National Nat-ural Science Foundation of China(No.52203376)the National Key Research and Development Program of China(No.2023YFB3813200).
文摘Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.
文摘The North China Plain and the agricultural region are crossed by the Shanxi-Beijing natural gas pipeline.Resi-dents in the area use rototillers for planting and harvesting;however,the depth of the rototillers into the ground is greater than the depth of the pipeline,posing a significant threat to the safe operation of the pipeline.Therefore,it is of great significance to study the dynamic response of rotary tillers impacting pipelines to ensure the safe opera-tion of pipelines.This article focuses on the Shanxi-Beijing natural gas pipeline,utilizingfinite element simulation software to establish afinite element model for the interaction among the machinery,pipeline,and soil,and ana-lyzing the dynamic response of the pipeline.At the same time,a decision tree model is introduced to classify the damage of pipelines under different working conditions,and the boundary value and importance of each influen-cing factor on pipeline damage are derived.Considering the actual conditions in the hemp yam planting area,targeted management measures have been proposed to ensure the operational safety of the Shanxi-Beijing natural gas pipeline in this region.
文摘In order to improve nitrogen removal in anoxic/oxic(A/O) process effectively for treating domestic wastewaters, the influence factors, DO(dissolved oxygen), nitrate recirculation, sludge recycle, SRT(solids residence time), influent COD/TN and HRT(hydraulic retention time) were studied. Results indicated that it was possible to increase nitrogen removal by using corresponding control strategies, such as, adjusting the DO set point according to effluent ammonia concentration; manipulating nitrate recirculation flow according to nitrate concentration at the end of anoxic zone. Based on the experiments results, a knowledge-based approach for supervision of the nitrogen removal problems was considered, and decision trees for diagnosing nitrification and denitrification problems were built and successfully applied to A/O process.
基金Supported by A grant of the Universidad Nacional Autonoma de Mexico SDI.PTID.05.6
文摘AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein, haptoglobin, α2 macroglobulin, and γ-glutamyl transpeptidase were used as predictors, and the FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of FO and F4 were classified with very high accuracy (18/20 for FO, 9/9 for FO-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in FO and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression,
文摘Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘The ID3 algorithm is a classical learning algorithm of decision tree in data mining.The algorithm trends to choosing the attribute with more values,affect the efficiency of classification and prediction for building a decision tree.This article proposes a new approach based on an improved ID3 algorithm.The new algorithm introduces the importance factor λ when calculating the information entropy.It can strengthen the label of important attributes of a tree and reduce the label of non-important attributes.The algorithm overcomes the flaw of the traditional ID3 algorithm which tends to choose the attributes with more values,and also improves the efficiency and flexibility in the process of generating decision trees.
基金supported by National Natural Science Foundation of China under Grant 60703013 and 10978011Key Program of National Natural Science Foundation of China under Grant 60932008+1 种基金National Science Fund for Distinguished Young Scholars under Grant 50925625China Postdoctoral Science Foundation.
文摘In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
文摘Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.
文摘This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.
基金Foundation of China(No.52067013)the Key Natural Science Fund Project of Gansu Provincial Department of Science and Technology(No.21JR7RA280)+1 种基金the Tianyou Innovation Team Science Foundation of Intelligent Power Supply and State Perception for Rail Transit(No.TY202010)the Natural Science Foundation of Gansu Province(No.20JR5RA395).
文摘Aiming at the problems of multiple types of power quality composite disturbances,strong feature correlation and high recognition error rate,a method of power quality composite disturbances identification based on multiresolution S-transform and decision tree was proposed.Firstly,according to IEEE standard,the signal models of seven single power quality disturbances and 17 combined power quality disturbances are given,and the disturbance waveform samples are generated in batches.Then,in order to improve the recognition accuracy,the adjustment factor is introduced to obtain the controllable time-frequency resolution through multi-resolution S-transform time-frequency domain analysis.On this basis,five disturbance time-frequency domain features are extracted,which quantitatively reflect the characteristics of the analyzed power quality disturbance signal,which is less than the traditional method based on S-transform.Finally,three classifiers such as K-nearest neighbor,support vector machine and decision tree algorithm are used to effectively complete the identification of power quality composite disturbances.Simulation results showthat the classification accuracy of decision tree algorithmis higher than that of K-nearest neighbor and support vector machine.Finally,the proposed method is compared with other commonly used recognition algorithms.Experimental results show that the proposedmethod is effective in terms of detection accuracy,especially for combined PQ interference.
基金Supported by the Open Subject of Key Lab of Resources Remote-sensing and Digital Agriculture in Agricultural Department(RDA1008)~~
文摘[Objective] The aim was to explore the feasibility of using single spectrum image to classify crops based on multi-spectral image and Decision Tree Method. [Method] Taking the typical agriculture plantation area in Hulunbeier area, according to field measured spectrum data, the optimum time of main crops, barley, wheat, rapeseed, based on crops spectrum characteristics, by dint of decision-making tree method, and considering spectral matching method, classification of crops was studied such as SAM. [Result] By dint of Landsat TM image gained in the first half of August, based on geographic and atmospheric proof-reading, decision-making tree was constructed. Plantation information about wheat, barley, and rapeseed and plantation grassland was extracted successfully. The general classification accuracy reached 86.90%. Kappa coefficient was 0.831 1. [Conclusion] Taking typical spectrum image as data source, and applying Decision Tree Method to get crops type's information had fine application future.
基金Supported by Financial Support of China Geological Survey(1212010916048)the Fundamental Research Funds for the Central Universities(200903046)~~
文摘With western Jilin Province as the study region, spectral characteristics and texture features of remote sensing images were taken as the classification basis to construct a Decision Tree Model and extract information about settlements in western Jilin Province, and the manually-extracted information about settlements in western Jilin Province was evaluated by confusion matrix. The results showed that Decision Tree Model was convenient for extracting settlements information by integrating spectral and texture features, and the accuracy of such a method was higher than that of the traditional Maximum Liklihood Method, in addition, calculation methods of extracting settlements information by this mean were concluded.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.
基金The National Natural Science Foundation of China under contract No.42076235.
文摘Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter data could be used to estimate sea ice freeboard and provide alternative Antarctic sea ice thickness information with a high precision and long time series,as other radar altimetry satellites can,needs further investigation.This paper proposed an algorithm to discriminate leads and then retrieve sea ice freeboard and thickness from HY-2B radar altimeter data.We first collected the Moderate-resolution Imaging Spectroradiometer ice surface temperature(IST)product from the National Aeronautics and Space Administration to extract leads from the Antarctic waters and verified their accuracy through Sentinel-1 Synthetic Aperture Radar images.Second,a surface classification decision tree was generated for HY-2B satellite altimeter measurements of the Antarctic waters to extract leads and calculate local sea surface heights.We then estimated the Antarctic sea ice freeboard and thickness based on local sea surface heights and the static equilibrium equation.Finally,the retrieved HY-2B Antarctic sea ice thickness was compared with the CryoSat-2 sea ice thickness and the Antarctic Sea Ice Processes and Climate(ASPeCt)ship-based observed sea ice thickness.The results indicate that our classification decision tree constructed for HY-2B satellite altimeter measurements was reasonable,and the root mean square error of the obtained sea ice thickness compared to the ship measurements was 0.62 m.The proposed sea ice thickness algorithm for the HY-2B radar satellite fills a gap in this application domain for the HY-series satellites and can be a complement to existing Antarctic sea ice thickness products;this algorithm could provide long-time-series and large-scale sea ice thickness data that contribute to research on global climate change.
文摘Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.
文摘This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.