摘要
异常挖掘是数据挖掘的重要研究内容之一 ,对于不完全数据会面对双重的困难 首先将用于缺失数据填充的EM算法和MI算法推广到混合缺失情形 ,并根据Weisberg的不完全数据填充理论 ,提出了RE算法 ,然后通过将聚类分析与向前搜索算法结合起来 ,获得了比单纯的向前搜索法更优越的算法 最后 ,在上述填充算法的基础上探讨了不完全数据的异常挖掘 理论和实例分析均表明 。
Lots of deferent ways can be used to mine outliers, among which, the forward search algorithm is one of the most important ways Since data are incomplete, data mining for outliers will encounter some difficulties, and thus one needs to make an attempt on this field First of all, one should think of the fill of those lost data Thinking of the mixed loss, one can simplify the application of algorithm, such as EM algorithm and MI algorithm Furthermore, the more simple and facile RE algorithm is proposed The actual fill of data indicates the effect of the method When one uses the forward search algorithm to mine outliers, analyzing the formation of EM algorithm, he can use the same method to estimate the unknown parameter Even when making usual statistical outliers testing, the test statistics that relies on residuals can also be also generated by EM algorithm That means the result of data mining is more credible when one first completes and then mines the data Finally, if one clusters the data beforehe selects initial subset, the result of research can be better and faster What's more, false conclusion can be avoided
出处
《计算机研究与发展》
EI
CSCD
北大核心
2004年第9期1532-1539,共8页
Journal of Computer Research and Development
关键词
缺失数据
EM算法
聚类分析
异常挖掘
missing data
EM algorithm
clustering analysis
outlier mining