期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop
1
作者 Yuanzhen Li Qun Yang +1 位作者 Shangqi Lai Bohan Li 《国际计算机前沿大会会议论文集》 2015年第1期83-84,共2页
As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to... As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to detect, speculative execution is usually used for dealing with this problem, by simply backing up those stragglers on alternative nodes. In this paper, we design a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop. In SECDT, we speculate completion time of stragglers and also of backup tasks, based on a kind of decision tree method: C4.5 decision tree. After we speculate the completion time, we compare the completion time of stragglers and of the backup tasks, calculating their differential value, and selecting the straggler with the maximum differential value to start the backup task.Experiment result shows that the SECDT can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time. 展开更多
关键词 SPECULATIVE EXECUTION c4.5 decision tree HADOOP
下载PDF
Research on Scholarship Evaluation System based on Decision Tree Algorithm 被引量:1
2
作者 YIN Xiao WANG Ming-yu 《电脑知识与技术》 2015年第3X期11-13,共3页
Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the betteri... Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE. 展开更多
关键词 data mining scholarship evaluation system decision tree algorithm c4.5 algorithm
下载PDF
An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm
3
作者 L Prema RAJESWARI Kannan ARPUTHARAJ 《International Journal of Communications, Network and System Sciences》 2008年第4期314-321,共8页
Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to... Intrusion detection systems provide additional defense capacity to a networked information system in addition to the security measures provided by the firewalls. This paper proposes an active rule based enhancement to the C4.5 algorithm for network intrusion detection in order to detect misuse behaviors of internal attackers through effective classification and decision making in computer networks. This enhanced C4.5 algorithm derives a set of classification rules from network audit data and then the generated rules are used to detect network intrusions in a real-time environment. Unlike most existing decision tree based approaches, the spawned rules generated and fired in this work are more effective because the information-theoretic approach minimizes the expected number of tests needed to classify an object and guarantees that a simple (but not necessarily the simplest) tree is found. The main advantage of this proposed algorithm is that the generalization ability of enhanced C4.5 decision trees is better than that of C4.5 decision trees. We have employed data from the third international knowledge discovery and data mining tools competition (KDDcup’99) to train and test the feasibility of this proposed model. By applying the enhanced C4.5 algorithm an average detection rate of 93.28 percent and a false positive rate of 0.7 percent have respectively been obtained in this work. 展开更多
关键词 decision tree INTRUSION Detection KDD CUP DATASET ENHANCED c4.5
下载PDF
Taiga: Performance Optimization of the C4.5 Decision Tree Construction Algorithm 被引量:9
4
作者 Yi Yang Wenguang Chen 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第4期415-425,共11页
Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construct... Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construction algorithms. It consists of several algorithms, of which the best one is a hybrid between a traditional recursive implementation and an iterative implementation which uses more memory but involves less write operations. We propose an optimized algorithm inspired by RainForest. By using a more sophisticated switching criterion between the two algorithms, we are able to get a performance gain even when all statistical information fits in memory. Evaluations show that our method can achieve a performance boost of 2.8 times in average than the traditional recursive implementation. 展开更多
关键词 c4.5 RAINFOREST decision trees machine learning performance optimization
原文传递
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
5
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 MULTI-DOCUMENT SUMMARIZATION Machine Learning decision trees ADABOOST c4.5 MEDICAL Document SUMMARIZATION
下载PDF
基于并行C4.5算法的民机质量数据决策指导
6
作者 魏壮宇 蔡红霞 李钧 《工业控制计算机》 2018年第5期129-130,133,共3页
民机设备系统每天都在产生大量的质量数据信息。随着时间的推移及数据量的积累,传统离散制造业的数据统计分析已经无法对这些庞大的质量数据进行有效地处理分析。为了解决这一问题,并挖掘出数据之间的隐含规律,提出了一种有效的数据挖... 民机设备系统每天都在产生大量的质量数据信息。随着时间的推移及数据量的积累,传统离散制造业的数据统计分析已经无法对这些庞大的质量数据进行有效地处理分析。为了解决这一问题,并挖掘出数据之间的隐含规律,提出了一种有效的数据挖掘方法。该方法通过集成决策树C4.5并行算法完成质量数据分析。分析结果展示了该分析方法的正确性、有效性和价值性。 展开更多
关键词 民机质量 数据分析 数据挖掘 决策树c4.5并行算法
下载PDF
面向乳腺肿瘤的诊前问答系统决策模型构建研究
7
作者 王世文 李一凡 +1 位作者 郑群 曹旭晨 《医学信息学杂志》 CAS 2023年第8期54-59,65,共7页
目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结... 目的/意义运用决策树分类模型模拟专家问诊思路,预测潜在或已有乳腺肿瘤患者的疾病风险。方法/过程采用C 4.5经典分类算法和悲观剪枝法,对调研收集的病例数据进行患者预问诊的结果预测。结果/结论生成一棵以“术后化疗or放疗在院是否结束”为根节点、拥有76个叶子节点的C 4.5决策树,预测准确率达95%,并根据分类标签划分为3个风险等级。 展开更多
关键词 乳腺肿瘤 C 4.5算法 决策树 模型构建
下载PDF
基于云边协同的决策树并行化设计
8
作者 姚跃 《无线互联科技》 2023年第2期55-57,102,共4页
随着海量数据的分析任务越来越重,数据挖掘工作需要进一步推进和优化。文章首先提出了基于云边协同的决策树并行化设计,根据连续属性离散化判断分裂属性,在属性确认之后建立决策树;其次对并行化设计内的数据进行预处理,构建决策树整体... 随着海量数据的分析任务越来越重,数据挖掘工作需要进一步推进和优化。文章首先提出了基于云边协同的决策树并行化设计,根据连续属性离散化判断分裂属性,在属性确认之后建立决策树;其次对并行化设计内的数据进行预处理,构建决策树整体并行流程;最终实现数据的实时分析与智能处理。对比试验表明,基于云边协同的决策树算法连续属性离散化的优化,在保证准确率的基础上,能有效地缩短运算时间,提高算法的运算速度。 展开更多
关键词 云边协同 决策树 并行化 边缘算法 属性相似度 数据处理
下载PDF
基于G^4 ICCS系统的数据挖掘并行算法 被引量:3
9
作者 刘威 路来君 +1 位作者 王洪肖 曹延波 《吉林大学学报(信息科学版)》 CAS 2013年第3期324-327,共4页
针对传统决策树SPRINT(Scalable Parallelizable Induction of Decision Trees)算法不能处理海量地学数据挖掘的问题,设计实现了基于G4ICCS(Geology Geography Geochemistry Geophysics Information Cloud ComputingSystem)的决策树并... 针对传统决策树SPRINT(Scalable Parallelizable Induction of Decision Trees)算法不能处理海量地学数据挖掘的问题,设计实现了基于G4ICCS(Geology Geography Geochemistry Geophysics Information Cloud ComputingSystem)的决策树并行分类算法PSPRINT。该算法使用哈希表存储连续属性分割点两侧的数据记录,为并行节点的分割提供依据,在MapReduce架构下解决了海量地学数据挖掘问题。实验结果表明,在模拟的云计算环境下,决策树并行算法可以处理海量地学数据分类问题,并获得较好的稳定性和较高的处理速度。 展开更多
关键词 地学G4ICCS系统 数据挖掘 决策树算法 并行
下载PDF
一种基于Hadoop架构的并行挖掘算法研究 被引量:13
10
作者 曾俊 《现代电子技术》 北大核心 2018年第1期117-119,124,共4页
基于Hadoop架构,提出一种并行的决策树挖掘算法实现大数据集间的知识挖掘。通过MapReduce并行编程模式实现Hadoop架构下SPRINT并行挖掘算法的频繁项集,解决了大数据集挖掘效率低下,时间消耗量大的问题。SPRINT算法通过对原始数据集进行... 基于Hadoop架构,提出一种并行的决策树挖掘算法实现大数据集间的知识挖掘。通过MapReduce并行编程模式实现Hadoop架构下SPRINT并行挖掘算法的频繁项集,解决了大数据集挖掘效率低下,时间消耗量大的问题。SPRINT算法通过对原始数据集进行划分,并将分块数据发给不同Map进程并行计算,使系统存储和计算资源得到有效利用,运用MapReduce各计算节点将挖掘结果数据汇聚,减少中间结果数据量,使并行挖掘时间显著减少。SPRINT算法并行化实验表明,Hadoop架构下的SPRINT并行挖掘算法具有良好的可扩展性和集群加速比。 展开更多
关键词 挖掘算法 Hadoop架构 SPRINT 并行化 决策树 MAPREDUCE
下载PDF
基于决策树的农业气象灾害等级预测模型(英文) 被引量:2
11
作者 司巧梅 《Meteorological and Environmental Research》 CAS 2010年第2期85-87,90,共4页
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo... Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results. 展开更多
关键词 Data mining Agro-meteorology decision tree c4.5 algorithm Classification mining China
下载PDF
基于聚类与决策树的综合入侵检测算法研究 被引量:1
12
作者 张会影 《计算机安全》 2010年第9期26-29,共4页
入侵检测是一种通过实时监测目标系统来发现入侵攻击行为的安全技术,传统的入侵检测系统在有效性、适应性和可扩展性方面都存在着不足。为了使模糊聚类算法获得的聚类结果为全局最优解,改进了传统的模糊C-均值算法,并且在每个聚类的数... 入侵检测是一种通过实时监测目标系统来发现入侵攻击行为的安全技术,传统的入侵检测系统在有效性、适应性和可扩展性方面都存在着不足。为了使模糊聚类算法获得的聚类结果为全局最优解,改进了传统的模糊C-均值算法,并且在每个聚类的数据集上建立一棵属于该聚类的C4.5决策树,构造了一种新的综合检测算法来确定是否存在入侵。通过实验结果分析,该检测算法降低了误报率,提高了入侵检测的检测性能以及可靠性。 展开更多
关键词 入侵检测 聚类 模糊C-均值算法 决策树
下载PDF
基于决策树分类器的归纳并行算法
13
作者 郭四稳 《计算机与数字工程》 2006年第9期25-26,48,共3页
分类决策树的归纳是一种重要的数据挖掘算法。本文重点介绍了两种基于并行算法的分类决策树的构造算法,并对它们的适用性及特点作了分析。
关键词 决策树 数据挖掘 并行算法 分类器
下载PDF
非规则齿轮行星系扎穴机构设计——基于粒计算决策树并行算法
14
作者 魏小燕 《农机化研究》 北大核心 2016年第11期128-132,共5页
作为占据世界21%人口的农业大国,中国要发展先进的现代农业,需要合理使用化学肥料,提高肥料的使用效率。与固态肥料相比较,液态更容易被作物吸收,肥料利用更直接,效率较高,经济成本更低。在国际上,俄罗斯、美国、澳大利亚等国家已经率... 作为占据世界21%人口的农业大国,中国要发展先进的现代农业,需要合理使用化学肥料,提高肥料的使用效率。与固态肥料相比较,液态更容易被作物吸收,肥料利用更直接,效率较高,经济成本更低。在国际上,俄罗斯、美国、澳大利亚等国家已经率先使用了液态肥料。为了节省肥料、提高农作物对肥料的吸收利用率、节省经济成本及降低对土壤的污染,基于粒计算决策树并行算法,设计了非规则齿轮行星系扎穴机构。该扎穴装置在作业中,化学肥液浪费较少、吸收效率较高。 展开更多
关键词 粒计算 决策树 并行算法 齿轮行星 扎穴机构
下载PDF
Study on the Grouping of Patients with Chronic Infectious Diseases Based on Data Mining
15
作者 Min Li 《Journal of Biosciences and Medicines》 2019年第11期119-135,共17页
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana... Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality. 展开更多
关键词 Data Mining K-Means Clustering algorithm C5.0 decision tree algorithm Customer Relationship Management PATIENTS with CHRONIC INFECTIOUS Disease
下载PDF
大数据平台上的并行CART决策树算法
16
作者 杜小芳 陈毅红 +1 位作者 王登辉 卢思阳 《西华师范大学学报(自然科学版)》 2021年第2期196-201,共6页
决策树是机器学习中最流行、应用最广泛的分类模型之一。针对Spark-MLlib决策树算法(MLDT)训练树模型效率较低的问题,提出了一种基于Spark平台的并行CART决策树算法(SPC-DT)。首先从数据并行优化的角度出发,采用数据垂直划分,该方法使... 决策树是机器学习中最流行、应用最广泛的分类模型之一。针对Spark-MLlib决策树算法(MLDT)训练树模型效率较低的问题,提出了一种基于Spark平台的并行CART决策树算法(SPC-DT)。首先从数据并行优化的角度出发,采用数据垂直划分,该方法使每次参与基尼值计算的都是一个完整的属性列,以减少数据节点之间信息交流造成的网络资源占用;其次采用Fayyad算法对连续属性进行离散化,以降低决策树训练过程中基尼值的计算频次;最后使用基尼指数来训练决策树模型以降低计算复杂度。实验结果表明,在分类精度方面,SPC-DT和MLDT差距不大,在树的训练效率上优于MLDT算法。 展开更多
关键词 决策树 Apache Spark Fayyad算法 数据并行 连续属性
下载PDF
一种有效的C4.5改进模型 被引量:28
17
作者 刘鹏 姚正 尹俊杰 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2006年第z1期996-1001,共6页
介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改... 介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改进。在R-C 4.5中,通过合并分类效果差的分枝,有效避免了碎片等问题。实验表明,R-C 4.5决策树在保持模型预测准确率的同时,有效改进了树的健壮性。作为R-C 4.5的简化版本,R-C 4.5c和R-C 4.5s可生成更为简单的树,而且R-C 4.5s通过数据预处理阶段完成,易于实现。 展开更多
关键词 决策树 R-c4.5 c4.5 分类器 数据挖掘
原文传递
Improving naive Bayes classifier by dividing its decision regions 被引量:3
18
作者 Zhi-yong YAN Gong-fu XU Yun-he PAN 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第8期647-657,共11页
Classification can be regarded as dividing the data space into decision regions separated by decision boundaries.In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective.Thus,a ... Classification can be regarded as dividing the data space into decision regions separated by decision boundaries.In this paper we analyze decision tree algorithms and the NBTree algorithm from this perspective.Thus,a decision tree can be regarded as a classifier tree,in which each classifier on a non-root node is trained in decision regions of the classifier on the parent node.Meanwhile,the NBTree algorithm,which generates a classifier tree with the C4.5 algorithm and the naive Bayes classifier as the root and leaf classifiers respectively,can also be regarded as training naive Bayes classifiers in decision regions of the C4.5 algorithm.We propose a second division (SD) algorithm and three soft second division (SD-soft) algorithms to train classifiers in decision regions of the naive Bayes classifier.These four novel algorithms all generate two-level classifier trees with the naive Bayes classifier as root classifiers.The SD and three SD-soft algorithms can make good use of both the information contained in instances near decision boundaries,and those that may be ignored by the naive Bayes classifier.Finally,we conduct experiments on 30 data sets from the UC Irvine (UCI) repository.Experiment results show that the SD algorithm can obtain better generali-zation abilities than the NBTree and the averaged one-dependence estimators (AODE) algorithms when using the C4.5 algorithm and support vector machine (SVM) as leaf classifiers.Further experiments indicate that our three SD-soft algorithms can achieve better generalization abilities than the SD algorithm when argument values are selected appropriately. 展开更多
关键词 Naive Bayes classifier decision region NBtree c4.5 algorithm Support vector machine (SVM)
原文传递
C 5.0决策树对早期胃癌风险筛查研究 被引量:3
19
作者 刘迷迷 刘永佳 +3 位作者 温丽 蔡巧 李丽婷 蔡永铭 《中华肿瘤防治杂志》 CAS 北大核心 2018年第16期1131-1135,共5页
目的 C 5.0算法改进C 4.5算法以提高分类效率和准确性,越来越广泛地应用于处理分类问题。本研究拟根据患者问卷调查和血清学检查等资料,利用C 5.0决策树算法筛查早期胃癌风险,筛选对早期胃癌风险筛查影响较大的因素,进而辅助临床提高早... 目的 C 5.0算法改进C 4.5算法以提高分类效率和准确性,越来越广泛地应用于处理分类问题。本研究拟根据患者问卷调查和血清学检查等资料,利用C 5.0决策树算法筛查早期胃癌风险,筛选对早期胃癌风险筛查影响较大的因素,进而辅助临床提高早期胃癌的诊断筛查。方法资料来自与广东药科大学附属第一医院的合作项目"基于云计算的早期胃癌筛查创新平台",对广东省6个市近30家医院消化内科就诊的618例胃病患者进行问卷调查,并收集其血清学检查和内镜检查及病理活组织检查资料。根据内镜检查和病理活组织检查结果将患者分为早期胃癌低危、中危及高危3类,用合成少数过采样技术(synthetic minority oversampling technique,SMOTE)方法处理样本分类不平衡问题,然后根据C 5.0算法建立早期胃癌风险筛查的决策树模型。结果产生1棵深度为11、共33个叶子节点的C 5.0决策树模型,对应有33条易于理解的分类规则,根据这些分类规则可快速评估患者的早期胃癌风险类型。建立的C 5.0决策树模型有较高的准确率,达73.28%,且增益图中曲线上凸明显,接近理想曲线,能较好地对早期胃癌风险进行分类预测。决策树模型计算各指标对早期胃癌风险预测的重要性,筛选出15个对早期胃癌风险筛查影响较大的因素,其中影响最大的因素是幽门螺旋杆菌(helicobacter pylori,Hp)抗体。结论基于患者问卷调查和血清学检查构建的C 5.0决策树模型对早期胃癌风险的预测效果较好,选出对早期胃癌风险筛查影响较大的因素,可辅助临床早期胃癌风险筛查。 展开更多
关键词 早期胃癌 C 5.0算法 决策树 风险筛查
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部