基于不完全数据的异常挖掘算法研究被引量：1

An Outlier Mining Algorithm Based on the Imcomplete Data

下载PDF

导出

摘要异常挖掘是数据挖掘的重要研究内容之一 ,对于不完全数据会面对双重的困难首先将用于缺失数据填充的EM算法和MI算法推广到混合缺失情形 ,并根据Weisberg的不完全数据填充理论 ,提出了RE算法 ,然后通过将聚类分析与向前搜索算法结合起来 ,获得了比单纯的向前搜索法更优越的算法最后 ,在上述填充算法的基础上探讨了不完全数据的异常挖掘理论和实例分析均表明。 Lots of deferent ways can be used to mine outliers, among which, the forward search algorithm is one of the most important ways Since data are incomplete, data mining for outliers will encounter some difficulties, and thus one needs to make an attempt on this field First of all, one should think of the fill of those lost data Thinking of the mixed loss, one can simplify the application of algorithm, such as EM algorithm and MI algorithm Furthermore, the more simple and facile RE algorithm is proposed The actual fill of data indicates the effect of the method When one uses the forward search algorithm to mine outliers, analyzing the formation of EM algorithm, he can use the same method to estimate the unknown parameter Even when making usual statistical outliers testing, the test statistics that relies on residuals can also be also generated by EM algorithm That means the result of data mining is more credible when one first completes and then mines the data Finally, if one clusters the data beforehe selects initial subset, the result of research can be better and faster What's more, false conclusion can be avoided

作者杨虎钟震程代杰

机构地区重庆大学数理学院重庆大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2004年第9期1532-1539,共8页 Journal of Computer Research and Development

关键词缺失数据 EM算法聚类分析异常挖掘 missing data EM algorithm clustering analysis outlier mining

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1J Han, M Kamber. Data Mining: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann, 2001
2E Hung, D W Cheung. Parallel mining of outliers in large database. Distributed and Parallel Databases, 2002, 12(1): 5～26
3A C Atkinson, T C Cheng. On robust linear regression with incomplete data. Computational Statistics & Data Analysis, 2000, 33(4): 361～380
4R J Glynn, N M Laird, D B Rubin. Multiple imputation in mixture models for nonignorable nonresponse with follow-ups. Journal of American Statistical Association, 1993, (423): 984～993
5S Weisberg. Applied Linear Regression. New York: John Wiley, 1985
6A P Dempster, M Laird, D B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Ser B, 1997, 39(1): 1～38
7C F J Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 1983, 11(1): 95～103
8L Xu, M I Jordan. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129～151
9张尧庭, 方开泰. 多元统计分析引论. 北京: 科学出版社, 1982(Zhang Yaoting, Fang Kaitai. Introduction of Multivariate Statistical Analysis(in Chinese). Beijing: Science Press, 1982)
10周纪芗. 回归分析. 上海: 华东师范大学出版社, 1993(Zhou Jixiang. Regression Analysis(in Chinese). Shanghai: East China Normal University Press, 1993)

同被引文献5

1张玉芳,代金龙,熊忠阳.分步填充缓解数据稀疏性的协同过滤算法[J].计算机应用研究,2013,30(9):2602-2605. 被引量：32
2朱金林,张正道,潘丰.基于动态贝叶斯网络的缺失数据系统故障辨识[J].信息与控制,2013,42(4):499-505. 被引量：3
3李建中,王宏志,高宏.大数据可用性的研究进展[J].软件学报,2016,27(7):1605-1625. 被引量：63
4贺丹,陈松灿.基于凸差规划的不完整数据填充聚类[J].模式识别与人工智能,2017,30(1):81-88. 被引量：3
5王军,李建勋,韩山,王兴.一种效能评估中缺失数据的填充方法[J].上海交通大学学报,2017,51(2):180-185. 被引量：7

引证文献1

1钱晓东,罗彦福.基于互信息属性排序的不完整数据聚类算法[J].信息与控制,2019,48(1):80-87. 被引量：10

二级引证文献10

1王逸飞,康季槐,应俊,杨俊杰,陈康.基于大数据建模的冠心病发病风险指标评估[J].解放军医学院学报,2019,40(8):725-729. 被引量：7
2罗浩,王彦捷,牛明航,邱存月,张利.动态区间的加权模糊聚类算法[J].计算机科学与探索,2020,14(7):1142-1153. 被引量：5
3段中兴,毕瀚元,张作伟.基于D-S证据理论的不完整数据混合分类算法[J].信息与控制,2020,49(4):455-463. 被引量：12
4柯建波.基于多分类器融合的信息中心网络源数据检测[J].智能计算机与应用,2020,10(10):153-156.
5陈扬,王金亮,夏炜,杨颢,朱润,奚雪峰.基于特征自动提取的足迹图像聚类方法[J].计算机科学,2021,48(S01):255-259. 被引量：1
6卫长安,王暾.基于数据保护的交互式网络信息链接预测方法[J].长治学院学报,2021,38(5):69-72.
7高志君,郑俊生,安敬民.支持用户偏好查询的领域概念图模型[J].计算机工程与设计,2022,43(3):744-750.
8李洁,许青,张露露,王英明.基于网格耦合的混合属性大数据聚类算法研究[J].信息工程大学学报,2022,23(2):218-223. 被引量：1
9宋世军,樊敏.基于谱聚类的多维数据集异常数据检测方法[J].吉林大学学报（工学版）,2023,53(10):2917-2922.
10张利,路颜萍,侯晴,张皓博.K近邻空间密度分布的模糊聚类算法[J].辽宁大学学报（自然科学版）,2023,50(4):289-301.

1卢振泰,张明慧,陈武凡.基于FCM算法与互信息量的图像自动分割[J].计算机工程与科学,2007,29(6):36-38.
2贺丹,陈松灿.基于凸差规划的不完整数据填充聚类[J].模式识别与人工智能,2017,30(1):81-88. 被引量：3
3杨毅,卢诚波.一种基于极限学习机的缺失数据填充方法[J].计算机应用与软件,2016,33(10):243-246. 被引量：9
4周秀梅,李作春,覃泽.有序填充微阵列缺失数据[J].计算机工程与应用,2009,45(22):111-113.
5卜范玉,陈志奎,张清辰.基于深度学习的不完整大数据填充算法[J].微电子学与计算机,2014,31(12):173-176. 被引量：12
6苏毅娟,钟智.代价敏感的缺失数据有序填充算法[J].计算机工程,2009,35(17):92-93.
7马亮,王文剑.一种基于数据独立性的SVC核参数选择方法[J].广西师范大学学报（自然科学版）,2007,25(4):59-62.
8朱彦君,吴向阳.基于张量分解的多维数据填充算法[J].计算机工程,2014,40(5):45-48. 被引量：2
9何田中,黄再祥.基于多置信度的不平衡数据分类算法[J].闽南师范大学学报（自然科学版）,2014,27(4):26-30.
10王军,李建勋,韩山,王兴.一种效能评估中缺失数据的填充方法[J].上海交通大学学报,2017,51(2):180-185. 被引量：7

计算机研究与发展

2004年第9期

浏览历史

内容加载中请稍等...

基于不完全数据的异常挖掘算法研究被引量：1

参考文献12

同被引文献5

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于不完全数据的异常挖掘算法研究 被引量：1

参考文献12

同被引文献5

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于不完全数据的异常挖掘算法研究被引量：1