摘要
解决数据本身的质量问题,以某移动通信用户离网原因分析及预测为主题及为数据挖掘模型处理出需要的数据是文章的主要目的。文中运用了数据预处理中,维规约,属性集成与构造,多重插补,离散化,规范化,数据抽样等方法来得到一个完整的、近似真实的数据集。针对所处理数据含有大量缺失值的特点,选取了插补的方法进行处理。包括方法的插补方法的选择,到最后使用多重插补方法对缺失数据进行修正。预处理后的数据应用到具体数据挖掘模型后提高了数据挖掘的效率,降低了数据挖掘复杂度。
The main purpose of the article is solving the data quality,in order to pre-process data for data mining model customer churn analysis and prediction of a certain mobile telecommunication industry as a subject.The method used in the paper are:dimension reduction,integration and structural properties,multiple imputation,discretization,normalization,data sampling and other methods to get a complete,similar to the real data sets.In this paper,the processing of data containing a large number of missing values to the characteristics of the imputation methods for processing.It is including the method of imputation method of choice and using multiple imputation methods for missing data correction.The data after pre-processing applied to the data mining model improve the efficiency and reduce the complexity of data mining.
出处
《计算机技术与发展》
2010年第11期225-228,共4页
Computer Technology and Development
基金
国家自然科学基金重点项目(70631003)
教育部博士点基金(200803590007)
关键词
数据预处理
数据挖掘
数据清洗
多重插补
缺失值
data pre-processing
data mining
data cleaning
multiple imputation
missing value