To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of...To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of DTSVM highly depends on its structure, to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes, genetic algorithm is introduced into the formation of decision tree, so that the most separable classes would be separated at each node of decisions tree. Numerical simulations conducted on three datasets compared with "one-against-all" and "one-against-one" demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and uns...Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study pres...Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study presents a machine learning approach based on the C5.0 decision tree(DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data(70% landslide pixels) and validation data(30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model.Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC(area under the receiver operating characteristic(ROC) curve) of the proposed model was the highest, reaching 0.88,compared with traditional models(support vector machine(SVM) = 0.85, Bayesian network(BN) = 0.81,frequency ratio(FR) = 0.75, weight of evidence(WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km^(2) and 0.88/km^(2), respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area.Our results indicate that the distribution of high susceptibility zones was more focused without containing more " stable" pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices.展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.展开更多
This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from tra...This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data. With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM bi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.展开更多
Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classif...Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classification accuracy of the tree.In this paper,the degree of dependency of decision attribute to condition attribute,based on rough set theory,is used as a heuristic for selecting the attribute that will best separate the samples into individual classes.The result of an example shows that compared with the entropy-based approach,our approach is a better way to select nodes for constructing decision trees.展开更多
In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy sampl...In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.展开更多
Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as ...Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.展开更多
针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模...针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模量、剪切模量,构建岩性识别模型分类器。通过该分类器对某山体内部岩石类型进行判断,研究结果证明:研究区内部多为辉长岩,玄武岩最少,通过模型分类结果与研究区真实地质对比分析,玄武岩正判率达到93%,安山岩、闪长岩正判率达到100%,花岗岩正判率达到88%,决策树建立的分类器模型能够基于地震波反射信号高效、准确地识别岩石岩性。展开更多
Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-ti...Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-time, in the context of high speed networks, by a decision tree classifier, using the information of only three easily extracted features (protocol, source port, and destination port), with an accuracy of 99%. Snort issues alert priorities based on its own default set of attack classes (34 classes) that are used by the default set of rules it provides. But the decision tree model is able to predict the priorities without using this default classification. The obtained tagger can provide a useful complement to an anomaly detection intrusion detection system.展开更多
This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric feature...This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.展开更多
The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes ...The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.展开更多
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this pa...Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-featu...Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-feature and decision tree is presented. Wavelet transform and S-transform are used to extract the feature quantity of each power quality disturbance signal, and a decision tree with classification rules is then constructed for classification and recognition based on the extracted feature quantity. The classification rules and decision tree classifier are established by combining the energy spectrum feature quantity extracted by wavelet transform and other seven time-frequency domain feature quantities extracted by S-transform. Simulation results show that the proposed method can effectively identify six types of common single disturbance signals and two mixed disturbance signals, with fast classification speed and adequate noise resistance. Its classification accuracy is also higher than those of support vector machine (SVM) and k-nearest neighbor (KNN) algorithms. Compared with the method that only uses S-transform, the proposed feature extraction method has more abundant features and higher classification accuracy for power quality disturbance.展开更多
Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valle...Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valley in Kenya.The work involves the collection of rock and soil samples in the field,their analyses using reflectance and emittance spectroscopy,and the processing and interpretation of Advanced Spaceborne Thermal Emission and Reflection Radiometer data through the DTC method.The latter method is strictly non-parametric,flexible and simple which does not require assumptions regarding the distributions of the input data.It has been successfully used in a wide range of classification problems.The DTC method successfully mapped the chert and trachyte series rocks,including clay minerals and evaporites of the area with higher overall accuracy(86%).Higher classification accuracies of the developed decision tree suggest its ability to adapt to noise and nonlinear relations often observed on the surface materials in space-borne spectral image data without making assumptions on the distribution of input data.Moreover,the present work found the DTC method useful in mapping lithological variations in the vast rugged terrain accurately,which are inherently equipped with different sources of noises even when subjected to considerable radiance and atmospheric correction.展开更多
As a widely-used machine-learning classifier,a decision tree model can be trained and deployed at a service provider to provide classification services for clients,e.g.,remote diagnostics.To address privacy concerns r...As a widely-used machine-learning classifier,a decision tree model can be trained and deployed at a service provider to provide classification services for clients,e.g.,remote diagnostics.To address privacy concerns regarding the sensitive information in these services(i.e.,the clients’inputs,model parameters,and classification results),we propose a privacy-preserving decision tree classification scheme(PDTC)in this paper.Specifically,we first tailor an additively homomorphic encryption primitive and a secret sharing technique to design a new secure two-party comparison protocol,where the numeric inputs of each party can be privately compared as a whole instead of doing that in a bit-by-bit manner.Then,based on the comparison protocol,we exploit the structure of the decision tree to construct PDTC,where the input of a client and the model parameters of a service provider are concealed from the counterparty and the classification result is only revealed to the client.A formal simulation-based security model and the security proof demonstrate that PDTC achieves desirable security properties.In addition,performance evaluation shows that PDTC achieves a lower communication and computation overhead compared with existing schemes.展开更多
Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple...Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.展开更多
基金supported by the National Natural Science Foundation of China (60604021 60874054)
文摘To solve the multi-class fault diagnosis tasks, decision tree support vector machine (DTSVM), which combines SVM and decision tree using the concept of dichotomy, is proposed. Since the classification performance of DTSVM highly depends on its structure, to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes, genetic algorithm is introduced into the formation of decision tree, so that the most separable classes would be separated at each node of decisions tree. Numerical simulations conducted on three datasets compared with "one-against-all" and "one-against-one" demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
文摘Karst rocky desertification is a phenomenon of land degradation as a result of affection by the interaction of natural and human factors.In the past,in the rocky desertification areas,supervised classification and unsupervised classification are often used to classify the remote sensing image.But they only use pixel brightness characteristics to classify it.So the classification accuracy is low and can not meet the needs of practical application.Decision tree classification is a new technology for remote sensing image classification.In this study,we select the rocky desertification areas Kaizuo Township as a case study,use the ASTER image data,DEM and lithology data,by extracting the normalized difference vegetation index,ratio vegetation index,terrain slope and other data to establish classification rules to build decision trees.In the ENVI software support,we access the classification images.By calculating the classification accuracy and kappa coefficient,we find that better classification results can be obtained,desertification information can be extracted automatically and if more remote sensing image bands used,higher resolution DEM employed and less errors data reduced during processing,classification accuracy can be improve further.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
基金This research is funded by the National Natural Science Foundation of China(Grant Nos.41807285 and 51679117)Key Project of the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection(SKLGP2019Z002)+3 种基金the National Science Foundation of Jiangxi Province,China(20192BAB216034)the China Postdoctoral Science Foundation(2019M652287 and 2020T130274)the Jiangxi Provincial Postdoctoral Science Foundation(2019KY08)Fundamental Research Funds for National Universities,China University of Geosciences(Wuhan)。
文摘Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study presents a machine learning approach based on the C5.0 decision tree(DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data(70% landslide pixels) and validation data(30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model.Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC(area under the receiver operating characteristic(ROC) curve) of the proposed model was the highest, reaching 0.88,compared with traditional models(support vector machine(SVM) = 0.85, Bayesian network(BN) = 0.81,frequency ratio(FR) = 0.75, weight of evidence(WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km^(2) and 0.88/km^(2), respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area.Our results indicate that the distribution of high susceptibility zones was more focused without containing more " stable" pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices.
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.
基金Project supported by the National Natural Science Foundation ofChina (No. 40101014) and by the Science and technology Committee of Zhejiang Province (No. 001110445) China
文摘This article presents two approaches for automated building of knowledge bases of soil resources mapping. These methods used decision tree and Bayesian predictive modeling, respectively to generate knowledge from training data. With these methods, building a knowledge base for automated soil mapping is easier than using the conventional knowledge acquisition approach. The knowledge bases built by these two methods were used by the knowledge classifier for soil type classification of the Longyou area, Zhejiang Province, China using TM bi-temporal imageries and GIS data. To evaluate the performance of the resultant knowledge bases, the classification results were compared to existing soil map based on field survey. The accuracy assessment and analysis of the resultant soil maps suggested that the knowledge bases built by these two methods were of good quality for mapping distribution model of soil classes over the study area.
文摘Decision trees induction algorithms have been used for classification in a wide range of application domains. In the process of constructing a tree, the criteria of selecting test attributes will influence the classification accuracy of the tree.In this paper,the degree of dependency of decision attribute to condition attribute,based on rough set theory,is used as a heuristic for selecting the attribute that will best separate the samples into individual classes.The result of an example shows that compared with the entropy-based approach,our approach is a better way to select nodes for constructing decision trees.
基金supported by National Natural Science Foundation of China under Grant 60703013 and 10978011Key Program of National Natural Science Foundation of China under Grant 60932008+1 种基金National Science Fund for Distinguished Young Scholars under Grant 50925625China Postdoctoral Science Foundation.
文摘In many decision making tasks,the features and decision are ordinal.Several ordinal classification learning algorithms have been developed in recent years,it is shown that these algorithms are sensitive to noisy samples and do not work in real-world applications.In this work,we propose a new measure of feature quality, called rank mutual information.Then,we design an ordinal decision tree(REOT) construction technique based on rank mutual information.The theoretic and experimental analysis shows that the proposed algorithm is effective.
文摘Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.
文摘针对地质建模时,人工识别山体内部岩石的局限性、低效率且易受主观因素影响等问题,提出了基于地震波反射信号的岩石类型自动识别技术。通过处理地震波反射信号获得岩石力学参数,采用Decision Tree ID3算法,提取岩石密度、波速、弹性模量、剪切模量,构建岩性识别模型分类器。通过该分类器对某山体内部岩石类型进行判断,研究结果证明:研究区内部多为辉长岩,玄武岩最少,通过模型分类结果与研究区真实地质对比分析,玄武岩正判率达到93%,安山岩、闪长岩正判率达到100%,花岗岩正判率达到88%,决策树建立的分类器模型能够基于地震波反射信号高效、准确地识别岩石岩性。
文摘Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-time, in the context of high speed networks, by a decision tree classifier, using the information of only three easily extracted features (protocol, source port, and destination port), with an accuracy of 99%. Snort issues alert priorities based on its own default set of attack classes (34 classes) that are used by the default set of rules it provides. But the decision tree model is able to predict the priorities without using this default classification. The obtained tagger can provide a useful complement to an anomaly detection intrusion detection system.
文摘This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.
文摘The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.
基金supported by the National Natural Science Foundation of China (No. 60673024)the "Eleventh Five" Preliminary Research Project of PLA (No. 102060206)
文摘Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
基金supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2019JM-544).
文摘Accurate classification of power quality disturbance is the premise and basis for improving and governing power quality. A method for power quality disturbance classification based on time-frequency domain multi-feature and decision tree is presented. Wavelet transform and S-transform are used to extract the feature quantity of each power quality disturbance signal, and a decision tree with classification rules is then constructed for classification and recognition based on the extracted feature quantity. The classification rules and decision tree classifier are established by combining the energy spectrum feature quantity extracted by wavelet transform and other seven time-frequency domain feature quantities extracted by S-transform. Simulation results show that the proposed method can effectively identify six types of common single disturbance signals and two mixed disturbance signals, with fast classification speed and adequate noise resistance. Its classification accuracy is also higher than those of support vector machine (SVM) and k-nearest neighbor (KNN) algorithms. Compared with the method that only uses S-transform, the proposed feature extraction method has more abundant features and higher classification accuracy for power quality disturbance.
文摘Here,we demonstrate the application of Decision Tree Classification(DTC)method for lithological mapping from multi-spectral satellite imagery.The area of investigation is the Lake Magadi in the East African Rift Valley in Kenya.The work involves the collection of rock and soil samples in the field,their analyses using reflectance and emittance spectroscopy,and the processing and interpretation of Advanced Spaceborne Thermal Emission and Reflection Radiometer data through the DTC method.The latter method is strictly non-parametric,flexible and simple which does not require assumptions regarding the distributions of the input data.It has been successfully used in a wide range of classification problems.The DTC method successfully mapped the chert and trachyte series rocks,including clay minerals and evaporites of the area with higher overall accuracy(86%).Higher classification accuracies of the developed decision tree suggest its ability to adapt to noise and nonlinear relations often observed on the surface materials in space-borne spectral image data without making assumptions on the distribution of input data.Moreover,the present work found the DTC method useful in mapping lithological variations in the vast rugged terrain accurately,which are inherently equipped with different sources of noises even when subjected to considerable radiance and atmospheric correction.
基金The associate editor coordinating the review of this paper and approving it for publication was X.Cheng。
文摘As a widely-used machine-learning classifier,a decision tree model can be trained and deployed at a service provider to provide classification services for clients,e.g.,remote diagnostics.To address privacy concerns regarding the sensitive information in these services(i.e.,the clients’inputs,model parameters,and classification results),we propose a privacy-preserving decision tree classification scheme(PDTC)in this paper.Specifically,we first tailor an additively homomorphic encryption primitive and a secret sharing technique to design a new secure two-party comparison protocol,where the numeric inputs of each party can be privately compared as a whole instead of doing that in a bit-by-bit manner.Then,based on the comparison protocol,we exploit the structure of the decision tree to construct PDTC,where the input of a client and the model parameters of a service provider are concealed from the counterparty and the classification result is only revealed to the client.A formal simulation-based security model and the security proof demonstrate that PDTC achieves desirable security properties.In addition,performance evaluation shows that PDTC achieves a lower communication and computation overhead compared with existing schemes.
文摘Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.