期刊文献+

基于属性熵和加权余弦相似度的离群算法 被引量:5

An Outlier Mining Algorithm Based on Attribute Entropy and Weighted Cosine Similarity
下载PDF
导出
摘要 离群点检测是数据挖掘的一个重要研究方向,大多数离群数据挖掘算法在应用到高维数据集时效率较低。给出了一种基于属性熵和加权余弦相似度的离群数据挖掘算法LEAWCD.该算法首先根据局部属性熵分析每个对象在其k-邻域内的局部离群属性,并依据各离群属性的属性偏离度自动设置属性权向量;其次使用对高维数据有效的余弦相似度经加权后度量各对象在k-邻域内的离群程度,实现高维局部离群点检测;最后采用国家天文台提供的天体光谱数据作为数据集,实验验证了LEAWCD算法具有伸缩性强和检测精度高等优点。 Outlier mining is an important branch of data mining field. At present, most of the outlier mining algorithms with high-dimensional data are low efficient. An outlier mining algorithm based on attribute entropy and weighted cosine similarity by the name of LEAWCD,is proposed in this paper. Firstly, the outlier attributes of each object in its k-neighborhood are determined by analyzing local attribute entropy. Secondly, attribute weight vector is set automatically on the basis of deviation degree of outlier attributes. Then the weighted cosine similarity, which is effective for high-dimensional data, is used to measure each object's outlier degree. Thus the local outliers are mined in high-dimensional data. Finally, the experiments show that LEAWCD has strong scalability and high precision by using the celestial spectrum data provided by the National Astronomical Observatory as experimental data.
出处 《太原科技大学学报》 2014年第3期171-175,共5页 Journal of Taiyuan University of Science and Technology
基金 太原科技大学青年基金项目(20093015)
关键词 属性熵 余弦相似度 离群数据 天体光谱 attribute entropy, cosine similarity, outlier data, celestial spectra
  • 相关文献

参考文献11

二级参考文献78

  • 1蔡江辉,张继福.基于聚类的离群数据挖掘及应用[J].太原重型机械学院学报,2004,25(4):254-258. 被引量:2
  • 2熊家军,李庆华.信息熵理论与入侵检测聚类问题研究[J].小型微型计算机系统,2005,26(7):1163-1166. 被引量:14
  • 3薛萍,金鸿章,王双.应用最大熵原理分析通信系统脆性风险[J].电机与控制学报,2007,11(1):74-78. 被引量:1
  • 4HAN Jiawei,KAMBER M.Data mining:concepts and techniques[M].Bejing:China Machine Press,2006:254-255.
  • 5HAWKINS D.Identification of outliers[M].London:Chapman and Hall,1980:2-28.
  • 6BARNETT V,LEWIS T.Outliers in statistical data[M].New York:John Wiley & Sons,1994:7,49.
  • 7RUTS I,ROUSSEEUW P.Computing depth contours of bivariate point clouds[J].Computational Statistics and Data Analysis,1996,23(1):153-168.
  • 8ARNING A,AGRAWAL R,RAGHAVAN P.A linear method for deviation in large database[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portlan,Oregon,USA,1996:164-169.
  • 9KNORR E M,NG R T.Algorithms of mining distance-based outliers in large datasets[C]//Proc of Int Conf on Very Large Database (VLDB'98).New York,USA,1998:392-402.
  • 10BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.Dallas:ACM Press,2000:93-104.

共引文献120

同被引文献45

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部