期刊文献+

数据预处理方法在移动通信行业中的应用 被引量:4

Application of Data Pre-processing Method in Mobile Telecommunication Industry
下载PDF
导出
摘要 解决数据本身的质量问题,以某移动通信用户离网原因分析及预测为主题及为数据挖掘模型处理出需要的数据是文章的主要目的。文中运用了数据预处理中,维规约,属性集成与构造,多重插补,离散化,规范化,数据抽样等方法来得到一个完整的、近似真实的数据集。针对所处理数据含有大量缺失值的特点,选取了插补的方法进行处理。包括方法的插补方法的选择,到最后使用多重插补方法对缺失数据进行修正。预处理后的数据应用到具体数据挖掘模型后提高了数据挖掘的效率,降低了数据挖掘复杂度。 The main purpose of the article is solving the data quality,in order to pre-process data for data mining model customer churn analysis and prediction of a certain mobile telecommunication industry as a subject.The method used in the paper are:dimension reduction,integration and structural properties,multiple imputation,discretization,normalization,data sampling and other methods to get a complete,similar to the real data sets.In this paper,the processing of data containing a large number of missing values to the characteristics of the imputation methods for processing.It is including the method of imputation method of choice and using multiple imputation methods for missing data correction.The data after pre-processing applied to the data mining model improve the efficiency and reduce the complexity of data mining.
作者 董艳
出处 《计算机技术与发展》 2010年第11期225-228,共4页 Computer Technology and Development
基金 国家自然科学基金重点项目(70631003) 教育部博士点基金(200803590007)
关键词 数据预处理 数据挖掘 数据清洗 多重插补 缺失值 data pre-processing data mining data cleaning multiple imputation missing value
  • 相关文献

参考文献8

  • 1Han J W Kamber M 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版杜,2001.147-158.
  • 2Rahm E,Do Hong hai. Data Cleaning: Problems and Current Approaches[ J ]. IEEE Data Engineering Bulletin, 2000, 23 (4):3-13.
  • 3Dasu T,John~n T. Exploratory Data Mining and Data Cleaning[M]. USA:John Wiley & Sons, Inc. Publication,2003.
  • 4胡红晓,谢佳,韩冰.缺失值处理方法比较研究[J].商场现代化,2007(05X):352-353. 被引量:18
  • 5Litle R J A, Rubin D B. Statistical Analysis With Missing Data[M].孙山译.北京:中国统计出版社,2004:10-74.
  • 6殷杰,石锐.SAS中处理数据集缺失值方法的对比研究[J].计算机应用,2007,27(B06):438-439. 被引量:8
  • 7杨军,赵宇,丁文兴.抽样调查中缺失数据的插补方法[J].数理统计与管理,2008,27(5):821-832. 被引量:28
  • 8Muller H, Freytag J C. Problems, Methods, and Challenges in Comprehensive Data Cleansing [ EB/OL]. 2003. http:// www. dbis, informatik, hu - berlin, de/fileadmin/reserach/papers/techreports/2003 - hub-ib-164 - muller. pelf.

二级参考文献41

  • 1茅群霞,李晓松.多重填补法Markov Chain Monte Carlo模型在有缺失值的妇幼卫生纵向数据中的应用[J].四川大学学报(医学版),2005,36(3):422-425. 被引量:7
  • 2杨军,邹国华.比例Bootstrap及其方差估计的相合性[J].中国科学院研究生院学报,2007,24(3):273-279. 被引量:2
  • 3GIARDINA M,HUO Y,AZUAJE F,et al.A Missing Data Estimation Analysis in Type Ⅱ Diabetes Databases[A].Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems[C].2005.
  • 4BARZI F,WOODWARD M.Imputation of Missing Values in Practice:Results from Imputations of Serum Cholesterol in 28 Cohort Studies[J].American Journal of Epidemiology,2004,160 (1):34-351.
  • 5ARNOLD AM,KRONMAL RA.Multiple Imputation of Baseline Data in the Cardiovascular Health Study[J].American Journal of Epidemiology,2003,157 (1):74-841.
  • 6Little R J A and Rubin D B. Statistical Analysis with Missing Data [M]. John Wiley and Sons, 2002(2nd Ed.).孙山泽译,缺失数据统计分析,中国统计出版社,2004.
  • 7Rao J N K. On variance estimation with imputed survey data [J]. J. Amer. Statist. Assoc., 1996, 91: 499-506.
  • 8Rubin D B. Multiple imputations in sample survey [C]. Proc. Survey Res. Meth. Sec., Am. Statist. Assoc., 1978: 20-34.
  • 9Rubin D B. Multiple Imputation for Nonresponse in surveys [M]. John Wiley and Sons, 1987.
  • 10Kim J K and Park H. Imputation using response probability [J]. The Canadian Journal of Statistics, 2006, 34 (1): 171-182.

共引文献162

同被引文献30

引证文献4

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部