期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop
1
作者 Yuanzhen Li Qun Yang +1 位作者 Shangqi Lai Bohan Li 《国际计算机前沿大会会议论文集》 2015年第1期83-84,共2页
As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to... As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to detect, speculative execution is usually used for dealing with this problem, by simply backing up those stragglers on alternative nodes. In this paper, we design a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop. In SECDT, we speculate completion time of stragglers and also of backup tasks, based on a kind of decision tree method: C4.5 decision tree. After we speculate the completion time, we compare the completion time of stragglers and of the backup tasks, calculating their differential value, and selecting the straggler with the maximum differential value to start the backup task.Experiment result shows that the SECDT can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time. 展开更多
关键词 SPECULATIVE EXECUTION c4.5 decision tree HADOOP
下载PDF
Research on Scholarship Evaluation System based on Decision Tree Algorithm 被引量:1
2
作者 YIN Xiao WANG Ming-yu 《电脑知识与技术》 2015年第3X期11-13,共3页
Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the betteri... Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE. 展开更多
关键词 data mining scholarship evaluation system decision tree algorithm c4.5 algorithm
下载PDF
An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm
3
作者 L Prema RAJESWARI Kannan ARPUTHARAJ 《International Journal of Communications, Network and System Sciences》 2008年第4期314-321,共8页
Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to... Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to the C4.5 algorithm for network intrusion detection in order to detect misuse behaviors of internal attackers through effective classification and decision making in computer networks. This enhanced C4.5 algorithm derives a set of classification rules from network audit data and then the generated rules are used to detect network intrusions in a real-time environment. Unlike most existing decision tree based approaches, the spawned rules generated and fired in this work are more effective because the information-theoretic approach minimizes the expected number of tests needed to classify an object and guarantees that a simple (but not necessarily the simplest) tree is found. The main advantage of this proposed algorithm is that the generalization ability of enhanced C4.5 decision trees is better than that of C4.5 decision trees. We have employed data from the third international knowledge discovery and data mining tools competition (KDDcup’99) to train and test the feasibility of this proposed model. By applying the enhanced C4.5 algorithm an average detection rate of 93.28 percent and a false positive rate of 0.7 percent have respectively been obtained in this work. 展开更多
关键词 decision tree INTRUSION Detection KDD CUP DATASET ENHANCED c4.5
下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
4
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 MULTI-DOCUMENT SUMMARIZATION Machine Learning decision trees ADABOOST c4.5 MEDICAL Document SUMMARIZATION
下载PDF
面向乳腺肿瘤的诊前问答系统决策模型构建研究
5
作者 王世文 李一凡 +1 位作者 郑群 曹旭晨 《医学信息学杂志》 CAS 2023年第8期54-59,65,共7页
目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结... 目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结束”为根节点、拥有76个叶子节点的C 4.5决策树,预测准确率达95%,并根据分类标签划分为3个风险等级。 展开更多
关键词 乳腺肿瘤 C 4.5算法 决策树 模型构建
下载PDF
基于聚类与决策树的综合入侵检测算法研究 被引量:1
6
作者 张会影 《计算机安全》 2010年第9期26-29,共4页
入侵检测是一种通过实时监测目标系统来发现入侵攻击行为的安全技术,传统的入侵检测系统在有效性、适应性和可扩展性方面都存在着不足。为了使模糊聚类算法获得的聚类结果为全局最优解,改进了传统的模糊C-均值算法,并且在每个聚类的数... 入侵检测是一种通过实时监测目标系统来发现入侵攻击行为的安全技术,传统的入侵检测系统在有效性、适应性和可扩展性方面都存在着不足。为了使模糊聚类算法获得的聚类结果为全局最优解,改进了传统的模糊C-均值算法,并且在每个聚类的数据集上建立一棵属于该聚类的C4.5决策树,构造了一种新的综合检测算法来确定是否存在入侵。通过实验结果分析,该检测算法降低了误报率,提高了入侵检测的检测性能以及可靠性。 展开更多
关键词 入侵检测 聚类 模糊C-均值算法 决策树
下载PDF
Study on the Grouping of Patients with Chronic Infectious Diseases Based on Data Mining
7
作者 Min Li 《Journal of Biosciences and Medicines》 2019年第11期119-135,共17页
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana... Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality. 展开更多
关键词 Data Mining K-Means Clustering algorithm C5.0 decision tree algorithm Customer Relationship Management PATIENTS with CHRONIC INFECTIOUS Disease
下载PDF
一种有效的C4.5改进模型 被引量:28
8
作者 刘鹏 姚正 尹俊杰 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2006年第z1期996-1001,共6页
介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改... 介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改进。在R-C 4.5中,通过合并分类效果差的分枝,有效避免了碎片等问题。实验表明,R-C 4.5决策树在保持模型预测准确率的同时,有效改进了树的健壮性。作为R-C 4.5的简化版本,R-C 4.5c和R-C 4.5s可生成更为简单的树,而且R-C 4.5s通过数据预处理阶段完成,易于实现。 展开更多
关键词 决策树 R-c4.5 c4.5 分类器 数据挖掘
原文传递
Improving naive Bayes classifier by dividing its decision regions 被引量:3
9
作者 Zhi-yong YAN Gong-fu XU Yun-he PAN 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第8期647-657,共11页
Classification can be regarded as dividing the data space into decision regions separated by decision boundaries.In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective.Thus,a ... Classification can be regarded as dividing the data space into decision regions separated by decision boundaries.In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective.Thus,a decision tree can be regarded as a classifier tree,in which each classifier on a non-root node is trained in decision regions of the classifier on the parent node.Meanwhile,the NBTree algorithm,which generates a classifier tree with the C4.5 algorithm and the naive Bayes classifier as the root and leaf classifiers respectively,can also be regarded as training naive Bayes classifiers in decision regions of the C4.5 algorithm.We propose a second division (SD) algorithm and three soft second division (SD-soft) algorithms to train classifiers in decision regions of the naive Bayes classifier.These four novel algorithms all generate two-level classifier trees with the naive Bayes classifier as root classifiers.The SD and three SD-soft algorithms can make good use of both the information contained in instances near decision boundaries,and those that may be ignored by the naive Bayes classifier.Finally,we conduct experiments on 30 data sets from the UC Irvine (UCI) repository.Experiment results show that the SD algorithm can obtain better generali-zation abilities than the NBTree and the averaged one-dependence estimators (AODE) algorithms when using the C4.5 algorithm and support vector machine (SVM) as leaf classifiers.Further experiments indicate that our three SD-soft algorithms can achieve better generalization abilities than the SD algorithm when argument values are selected appropriately. 展开更多
关键词 Naive Bayes classifier decision region NBtree c4.5 algorithm Support vector machine (SVM)
原文传递
C 5.0决策树对早期胃癌风险筛查研究 被引量:3
10
作者 刘迷迷 刘永佳 +3 位作者 温丽 蔡巧 李丽婷 蔡永铭 《中华肿瘤防治杂志》 CAS 北大核心 2018年第16期1131-1135,共5页
目的 C 5.0算法改进C 4.5算法以提高分类效率和准确性,越来越广泛地应用于处理分类问题。本研究拟根据患者问卷调查和血清学检查等资料,利用C 5.0决策树算法筛查早期胃癌风险,筛选对早期胃癌风险筛查影响较大的因素,进而辅助临床提高早... 目的 C 5.0算法改进C 4.5算法以提高分类效率和准确性,越来越广泛地应用于处理分类问题。本研究拟根据患者问卷调查和血清学检查等资料,利用C 5.0决策树算法筛查早期胃癌风险,筛选对早期胃癌风险筛查影响较大的因素,进而辅助临床提高早期胃癌的诊断筛查。方法资料来自与广东药科大学附属第一医院的合作项目"基于云计算的早期胃癌筛查创新平台",对广东省6个市近30家医院消化内科就诊的618例胃病患者进行问卷调查,并收集其血清学检查和内镜检查及病理活组织检查资料。根据内镜检查和病理活组织检查结果将患者分为早期胃癌低危、中危及高危3类,用合成少数过采样技术(synthetic minority oversampling technique,SMOTE)方法处理样本分类不平衡问题,然后根据C 5.0算法建立早期胃癌风险筛查的决策树模型。结果产生1棵深度为11、共33个叶子节点的C 5.0决策树模型,对应有33条易于理解的分类规则,根据这些分类规则可快速评估患者的早期胃癌风险类型。建立的C 5.0决策树模型有较高的准确率,达73.28%,且增益图中曲线上凸明显,接近理想曲线,能较好地对早期胃癌风险进行分类预测。决策树模型计算各指标对早期胃癌风险预测的重要性,筛选出15个对早期胃癌风险筛查影响较大的因素,其中影响最大的因素是幽门螺旋杆菌(helicobacter pylori,Hp)抗体。结论基于患者问卷调查和血清学检查构建的C 5.0决策树模型对早期胃癌风险的预测效果较好,选出对早期胃癌风险筛查影响较大的因素,可辅助临床早期胃癌风险筛查。 展开更多
关键词 早期胃癌 C 5.0算法 决策树 风险筛查
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部