期刊文献+

基于张开角测度的数据约简 被引量:2

Data reduction based on open-angle measurement
下载PDF
导出
摘要 数据约简是包括数据压缩、数据调整和特征提取在内的数据挖掘技术中的重要课题,但已有的数据约简方法主要聚焦在特征或者维度的约简,而针对样本个数的约简方法,往往是针对具体的数据集开发,缺乏一般性。针对数据集中数据分布的一般特征,定义一种新的基于张开角的测度。该测度能够区分数据集中核心对象和边界对象分布的本质区别,实现数据集中以核心对象为中心的数据压缩。通过对UCI公共测试平台上20个具有不同特征的典型样本集进行数据约简和测试,结果表明:约简能够有效地提取数据集中的核心目标;通过对约简前后数据集采用经典K均值算法聚类,发现约简后数据集中聚类正确率明显高于约简前数据集。 Data reduction has been an important issue of data mining including data compression,data adjustment,feature extraction,and so on,however,existing methods of data reduction mainly focus on reduction of features and dimensions,methods of reduction to the number of samples always limit to specific data sets which lack of generality. Aiming at general feature of data distribution in data sets,define a new kind of measurement based on opening angle. This measurement can distinguish essential difference of distribution between kernel objects and boundary objects,and realize data compression which takes kernel objects as center for data sets. By data reduction and test on 20 typical simple sets which have different features on UCI public test platform,the result demonstrates the proposed method can extract kernel objects in data sets effectively; by using the typical kmeans algorithm to cluster the data sets before data reduction,cluster accuracy of reduced data sets is apparently higher than that of original data sets.
出处 《传感器与微系统》 CSCD 2016年第4期25-28,31,共5页 Transducer and Microsystem Technologies
基金 国家自然科学基金资助项目(61174014)
关键词 数据约简 方向角 聚类分析 data reduction direction angle cluster analysis
  • 相关文献

参考文献13

  • 1Sanguinetti G.Dimensionality reduction of clustered data sets[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,30(3):535-540.
  • 2Bochanski J J,Hennawi J F,Simcoe R A,et al.MASE:A new data reduction pipeline for the magellan echellette spectrograph[J].Publications of the Astronomical Society of the Pacific,2009,121(886):1409-1418.
  • 3Wacker L,Christl M,Synal H A.Bats:A new tool for AMS data reduction[J].Nuclear Instruments and Methods in Physics Research Section B:Beam Interactions with Materials and Atoms,2010,268(7):976-979.
  • 4Jain A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
  • 5Cool R J,Moustakas J,Blanton M R,et al.The prism multi-object survey(PRIMUS).II.data reduction and red-shift fitting[J].The Astrophysical Journal,2013,767(2):118.
  • 6Batchelor B G.Pattern recognition:Ideas in practice[M].New York:Springer Science&Business Media,2012.
  • 7黄治国,王端.基于粗糙集的数据约简方法研究[J].计算机工程与设计,2009,30(18):4284-4286. 被引量:7
  • 8邓少波,关素洁,黎敏,刘清.属性与属性值合一的数据约简算法[J].模式识别与人工智能,2009,22(2):195-201. 被引量:4
  • 9Machine learning:An artificial intelligence approach[M].New York:Springer Science&Business Media,2013.
  • 10张学工.模式识别[M].北京:清华大学出版社,2010.

二级参考文献38

共引文献153

同被引文献8

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部