摘要
为了解决数据挖掘算法的高效性、工具智能性与所获取知识无效性之间的矛盾,同时使所获取知识能有效地解决地层评价领域中的各种疑难问题,提出任务驱动数据挖掘方法。结合数据挖掘的概念和技术,阐述了任务驱动数据挖掘的概念和基本原理,分为建立数据仓库、数据预处理、选择特征子集、形成模型、模型评估、模型修正和模型发布等7个部分,是一个循环迭代的过程,直到构建能有效解决目标任务的预测模型为止。并以低阻油层的识别为例,详述整个分析处理过程,结合决策树形成的白盒模型和支持向量机构建的黑盒模型综合识别目标区的低阻油层,获得了准确率大于90%的识别效果。
In the traditional data-driven data mining process,there are huge gaps between the efficient algorithms and intelligent tools as well as the invalidity of knowledge which is obtained by traditional data-driven data mining.Meanwhile,each data in the earth science field contains a solid physical meaning.If there is no corresponding domain knowledge involved in the mining process,the information explored by data-driven data mining will be lack of practicability and not able to effectively solve problems in the earth science area.Therefore,the task-driven data mining is proposed.Additionally,task-driven data mining concepts and principles are elaborated with the help of data mining concepts and techniques.It is divided into seven elements such as data warehousing,data preprocessing,feature subset selection,model formation,model evaluation,model modification and model published.Those constitute a cyclic and iterative process until a predictive model which is capable of effectively achieving the objectives.The task-driven data mining is applied to recognizing the low resistivity reservoirs,and the whole analysis process is elaborated.The white-box model of decision tree and the black-box model of support vector machine are introduced to identify the low resistivity reservoirs,and the accuracy is more than 90%.
出处
《吉林大学学报(地球科学版)》
EI
CAS
CSCD
北大核心
2012年第1期39-46,共8页
Journal of Jilin University:Earth Science Edition
基金
国家'863'计划项目(2009AA062802)
关键词
任务驱动数据挖掘
低阻油层
分类算法
决策树
支持向量机
预测模型
储层
task-driven data mining
low resistivity oil reservoir
classification algorithms
decision tree
support vector machine
predictive model
reservoirs