摘要
消费行为因素分析对产品生产和销售具有重要指导作用。C4.5算法是基于信息熵理论进行数据分类分析的经典决策树数据挖掘算法,先对消费数据集进行数据预处理,为了利用消费者的消费数据进行消费行为分析,对消费数据形式化表示,形成消费客户交易数据集和交易统计信息表达。然后在消费客户交易数据集上定义了信息增益率,反映消费因素的分类能力。利用C4.5算法对消费者行为进行分析并构造出决策树,挖掘消费数据之间隐藏的潜在关系,对企业的生产经营具有重大的指导意义;运用预剪枝和后剪枝对完全决策树进行剪枝,对比剪枝后效果。
The analysis of the consumption behavior factors plays an important guiding role on production and sales for enterprises. C4. 5 algorithm is the information entropy theory of classical decision tree data mining algorithm based on classification analysis,first on the consumption data sets for data preprocessing,in order to use the consumption data to model and analyze the consumption behavior factors,the consumption data is formalized to the consumer transaction data sets and transaction statistics. Then,the information gain-ratio is defined based on the consumer transaction data sets to reflect the classification ability of the consumption factors. It analysis of consumer behavior and constructs the decision tree by C4. 5 algorithm,the relationship between the mining potential hidden consumption data,has great guiding significance to the production and operation of enterprises; the use of pre-pruning and post-pruning complete decision tree pruning,pruning effect comparison.
出处
《信息技术》
2016年第4期14-17,共4页
Information Technology
基金
国家自然科学基金(60473125)
国家高技术研究发展计划(863)(863-317-01-04-99
2009AA062820)
中国石油(CNPC)石油科技中青年创新基金(05E7013)
国家重大专项子课题(G5800-08-ZS-WX)
关键词
决策树
C4.5算法
信息增益率
连续型属性
decision tree
C4.5 algorithm
information gainratio
continuous attributes