期刊文献+

基于不完全数据的异常挖掘算法研究 被引量:1

An Outlier Mining Algorithm Based on the Imcomplete Data
下载PDF
导出
摘要 异常挖掘是数据挖掘的重要研究内容之一 ,对于不完全数据会面对双重的困难 首先将用于缺失数据填充的EM算法和MI算法推广到混合缺失情形 ,并根据Weisberg的不完全数据填充理论 ,提出了RE算法 ,然后通过将聚类分析与向前搜索算法结合起来 ,获得了比单纯的向前搜索法更优越的算法 最后 ,在上述填充算法的基础上探讨了不完全数据的异常挖掘 理论和实例分析均表明 。 Lots of deferent ways can be used to mine outliers, among which, the forward search algorithm is one of the most important ways Since data are incomplete, data mining for outliers will encounter some difficulties, and thus one needs to make an attempt on this field First of all, one should think of the fill of those lost data Thinking of the mixed loss, one can simplify the application of algorithm, such as EM algorithm and MI algorithm Furthermore, the more simple and facile RE algorithm is proposed The actual fill of data indicates the effect of the method When one uses the forward search algorithm to mine outliers, analyzing the formation of EM algorithm, he can use the same method to estimate the unknown parameter Even when making usual statistical outliers testing, the test statistics that relies on residuals can also be also generated by EM algorithm That means the result of data mining is more credible when one first completes and then mines the data Finally, if one clusters the data beforehe selects initial subset, the result of research can be better and faster What's more, false conclusion can be avoided
出处 《计算机研究与发展》 EI CSCD 北大核心 2004年第9期1532-1539,共8页 Journal of Computer Research and Development
关键词 缺失数据 EM算法 聚类分析 异常挖掘 missing data EM algorithm clustering analysis outlier mining
  • 相关文献

参考文献12

  • 1J Han, M Kamber. Data Mining: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann, 2001
  • 2E Hung, D W Cheung. Parallel mining of outliers in large database. Distributed and Parallel Databases, 2002, 12(1): 5~26
  • 3A C Atkinson, T C Cheng. On robust linear regression with incomplete data. Computational Statistics & Data Analysis, 2000, 33(4): 361~380
  • 4R J Glynn, N M Laird, D B Rubin. Multiple imputation in mixture models for nonignorable nonresponse with follow-ups. Journal of American Statistical Association, 1993, (423): 984~993
  • 5S Weisberg. Applied Linear Regression. New York: John Wiley, 1985
  • 6A P Dempster, M Laird, D B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Ser B, 1997, 39(1): 1~38
  • 7C F J Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 1983, 11(1): 95~103
  • 8L Xu, M I Jordan. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129~151
  • 9张尧庭, 方开泰. 多元统计分析引论. 北京: 科学出版社, 1982(Zhang Yaoting, Fang Kaitai. Introduction of Multivariate Statistical Analysis(in Chinese). Beijing: Science Press, 1982)
  • 10周纪芗. 回归分析. 上海: 华东师范大学出版社, 1993(Zhou Jixiang. Regression Analysis(in Chinese). Shanghai: East China Normal University Press, 1993)

同被引文献5

引证文献1

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部