期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop
1
作者 Yuanzhen Li Qun Yang +1 位作者 Shangqi Lai Bohan Li 《国际计算机前沿大会会议论文集》 2015年第1期83-84,共2页
As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to... As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to detect, speculative execution is usually used for dealing with this problem, by simply backing up those stragglers on alternative nodes. In this paper, we design a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop. In SECDT, we speculate completion time of stragglers and also of backup tasks, based on a kind of decision tree method: C4.5 decision tree. After we speculate the completion time, we compare the completion time of stragglers and of the backup tasks, calculating their differential value, and selecting the straggler with the maximum differential value to start the backup task.Experiment result shows that the SECDT can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time. 展开更多
关键词 SPECULATIVE EXECUTION c4.5 decision tree HADOOP
下载PDF
Forecasting Model of Agro-meteorological Disaster Grade Based on Decision Tree 被引量:2
2
作者 司巧梅 《Meteorological and Environmental Research》 CAS 2010年第2期85-87,90,共4页
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo... Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results. 展开更多
关键词 Data mining Agro-meteorology decision tree c4.5 algorithm classification mining China
下载PDF
Landslide susceptibility zonation method based on C5.0 decision tree and K-means cluster algorithms to improve the efficiency of risk management 被引量:15
3
作者 Zizheng Guo Yu Shi +2 位作者 Faming Huang Xuanmei Fan Jinsong Huang 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第6期243-261,共19页
Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study pres... Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation.This study presents a machine learning approach based on the C5.0 decision tree(DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data(70% landslide pixels) and validation data(30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model.Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC(area under the receiver operating characteristic(ROC) curve) of the proposed model was the highest, reaching 0.88,compared with traditional models(support vector machine(SVM) = 0.85, Bayesian network(BN) = 0.81,frequency ratio(FR) = 0.75, weight of evidence(WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km^(2) and 0.88/km^(2), respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area.Our results indicate that the distribution of high susceptibility zones was more focused without containing more " stable" pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices. 展开更多
关键词 Landslide susceptibility Frequency ratio C5.0 decision tree K-means cluster classification Risk management
下载PDF
Effective use of FibroTest to generate decision trees in hepatitis C 被引量:2
4
作者 Dana Lau-Corona Luís Alberto Pineda +10 位作者 Héctor Hugo Avilés Gabriela Gutiérrez-Reyes Blanca Eugenia Farfan-Labonne Rafael Núez-Nateras Alan Bonder Rosalinda Martínez-García Clara Corona-Lau Marco Antonio Olivera-Martínez Maria Concepción Gutiérrez-Ruiz Guillermo Robles-Díaz David Kershenobich 《World Journal of Gastroenterology》 SCIE CAS CSCD 2009年第21期2617-2622,共6页
AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with d... AIM: To assess the usefulness of FibroTest to forecast scores by constructing decision trees in patients with chronic hepatitis C.METHODS: We used the C4.5 classification algorithm to construct decision trees with data from 261 patients with chronic hepatitis C without a liver biopsy. The FibroTest attributes of age, gender, bilirubin, apolipoprotein, haptoglobin, α2 macroglobulin, and γ-glutamyl transpeptidase were used as predictors, and the FibroTest score as the target. For testing, a 10-fold cross validation was used.RESULTS: The overall classification error was 14.9% (accuracy 85.1%). FibroTest's cases with true scores of FO and F4 were classified with very high accuracy (18/20 for FO, 9/9 for FO-1 and 92/96 for F4) and the largest confusion centered on F3. The algorithm produced a set of compound rules out of the ten classification trees and was used to classify the 261 patients. The rules for the classification of patients in FO and F4 were effective in more than 75% of the cases in which they were tested.CONCLUSION: The recognition of clinical subgroups should help to enhance our ability to assess differences in fibrosis scores in clinical studies and improve our understanding of fibrosis progression, 展开更多
关键词 Hepatitis C FibroTest decision trees c4.5algorithm Non-invasive biomarkers
下载PDF
A decision tree based decomposition method for oil refinery scheduling 被引量:2
5
作者 Xiaoyong Gao Dexian Huang +1 位作者 Yongheng Jiang Tao Chen 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2018年第8期1605-1612,共8页
Refinery scheduling attracts increasing concerns in both academic and industrial communities in recent years.However, due to the complexity of refinery processes, little has been reported for success use in real world... Refinery scheduling attracts increasing concerns in both academic and industrial communities in recent years.However, due to the complexity of refinery processes, little has been reported for success use in real world refineries. In academic studies, refinery scheduling is usually treated as an integrated, large-scale optimization problem,though such complex optimization problems are extremely difficult to solve. In this paper, we proposed a way to exploit the prior knowledge existing in refineries, and developed a decision making system to guide the scheduling process. For a real world fuel oil oriented refinery, ten adjusting process scales are predetermined. A C4.5 decision tree works based on the finished oil demand plan to classify the corresponding category(i.e. adjusting scale). Then,a specific sub-scheduling problem with respect to the determined adjusting scale is solved. The proposed strategy is demonstrated with a scheduling case originated from a real world refinery. 展开更多
关键词 Refinery scheduling decision tree c4.5 Decomposition method
下载PDF
Research on Scholarship Evaluation System based on Decision Tree Algorithm 被引量:1
6
作者 YIN Xiao WANG Ming-yu 《电脑知识与技术》 2015年第3X期11-13,共3页
Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the betteri... Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE. 展开更多
关键词 data mining scholarship evaluation system decision tree algorithm c4.5 algorithm
下载PDF
An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm
7
作者 L Prema RAJESWARI Kannan ARPUTHARAJ 《International Journal of Communications, Network and System Sciences》 2008年第4期314-321,共8页
Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to... Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to the C4.5 algorithm for network intrusion detection in order to detect misuse behaviors of internal attackers through effective classification and decision making in computer networks. This enhanced C4.5 algorithm derives a set of classification rules from network audit data and then the generated rules are used to detect network intrusions in a real-time environment. Unlike most existing decision tree based approaches, the spawned rules generated and fired in this work are more effective because the information-theoretic approach minimizes the expected number of tests needed to classify an object and guarantees that a simple (but not necessarily the simplest) tree is found. The main advantage of this proposed algorithm is that the generalization ability of enhanced C4.5 decision trees is better than that of C4.5 decision trees. We have employed data from the third international knowledge discovery and data mining tools competition (KDDcup’99) to train and test the feasibility of this proposed model. By applying the enhanced C4.5 algorithm an average detection rate of 93.28 percent and a false positive rate of 0.7 percent have respectively been obtained in this work. 展开更多
关键词 decision tree INTRUSION Detection KDD CUP DATASET ENHANCED c4.5
下载PDF
面向乳腺肿瘤的诊前问答系统决策模型构建研究
8
作者 王世文 李一凡 +1 位作者 郑群 曹旭晨 《医学信息学杂志》 CAS 2023年第8期54-59,65,共7页
目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结... 目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结束”为根节点、拥有76个叶子节点的C 4.5决策树,预测准确率达95%,并根据分类标签划分为3个风险等级。 展开更多
关键词 乳腺肿瘤 C 4.5算法 决策树 模型构建
下载PDF
Taiga: Performance Optimization of the C4.5 Decision Tree Construction Algorithm 被引量:9
9
作者 Yi Yang Wenguang Chen 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第4期415-425,共11页
Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construct... Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construction algorithms. It consists of several algorithms, of which the best one is a hybrid between a traditional recursive implementation and an iterative implementation which uses more memory but involves less write operations. We propose an optimized algorithm inspired by RainForest. By using a more sophisticated switching criterion between the two algorithms, we are able to get a performance gain even when all statistical information fits in memory. Evaluations show that our method can achieve a performance boost of 2.8 times in average than the traditional recursive implementation. 展开更多
关键词 c4.5 RAINFOREST decision trees machine learning performance optimization
原文传递
一种新的支持向量机决策树设计算法 被引量:8
10
作者 张先武 郭雷 《火力与指挥控制》 CSCD 北大核心 2010年第10期31-35,共5页
支持向量机决策树的精度和速度取决于树结构。为了获得好的泛化性能,应由可分性强的类为树的上层结点定义分类子任务。提出了一种新的支持向量机决策树设计算法。决策树中每个结点的分类子任务定义规则如下:采用模糊核C-均值将当前训练... 支持向量机决策树的精度和速度取决于树结构。为了获得好的泛化性能,应由可分性强的类为树的上层结点定义分类子任务。提出了一种新的支持向量机决策树设计算法。决策树中每个结点的分类子任务定义规则如下:采用模糊核C-均值将当前训练集粗分为两个子集,然后基于隶属度从各个子集中选择可分性强的子类定义当前结点的分类子任务,并将可分性弱的子类移至下层结点。实验结果表明,该方法的精度和速度都优于其他传统的多类分类方法。 展开更多
关键词 支持向量机 多类分类 模糊核C-均值 决策树
下载PDF
基于不平衡数据的公司破产预测研究 被引量:3
11
作者 周文泳 冯丽霞 段春艳 《同济大学学报(自然科学版)》 EI CAS CSCD 北大核心 2022年第2期283-290,共8页
整合创新数据预处理技术与集成算法利用不平衡数据探讨了公司破产预测问题。首先,运用冗余信息处理方法、不同抽样方法等对不平衡数据进行预处理。其次,以5.0分类器(Classifier 5.0,C5.0)决策树和单隐层前馈神经网络作为基分类器,分别... 整合创新数据预处理技术与集成算法利用不平衡数据探讨了公司破产预测问题。首先,运用冗余信息处理方法、不同抽样方法等对不平衡数据进行预处理。其次,以5.0分类器(Classifier 5.0,C5.0)决策树和单隐层前馈神经网络作为基分类器,分别与三类重抽样数据预处理技术结合,择出最优抽样法。再次,结合自助汇聚法提升分类效果,并运用十折交叉验证的受试者操作特征曲线的下方面积进行评价,对比了两基分类器的集成模型。最后,运用加利福尼亚大学尔湾分校数据库中一万多家波兰制造业公司的实际数据进行实验验证。实验结果表明:欠抽样或人工少数类过采样法与神经网络结合的集成模型分类效果最优,为企业实施破产预测提供积极支撑。 展开更多
关键词 二元分类 不平衡数据 神经网络 C5.0决策树 集成方法
下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
12
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 MULTI-DOCUMENT SUMMARIZATION Machine Learning decision trees ADABOOST c4.5 MEDICAL Document SUMMARIZATION
下载PDF
一种有效的C4.5改进模型 被引量:28
13
作者 刘鹏 姚正 尹俊杰 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2006年第z1期996-1001,共6页
介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改... 介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改进。在R-C 4.5中,通过合并分类效果差的分枝,有效避免了碎片等问题。实验表明,R-C 4.5决策树在保持模型预测准确率的同时,有效改进了树的健壮性。作为R-C 4.5的简化版本,R-C 4.5c和R-C 4.5s可生成更为简单的树,而且R-C 4.5s通过数据预处理阶段完成,易于实现。 展开更多
关键词 决策树 R-c4.5 c4.5 分类器 数据挖掘
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部