期刊文献+

一种有效的高属性维稀疏数据聚类算法 被引量:6

An Effective High Attribute Dimensional Sparse Clustering
原文传递
导出
摘要 聚类分析是数据挖掘最常见的技术之一.数据的规模、维数和稀疏性都是制约聚类分析的不同方面.本文提出一种有效的高属性维稀疏数据聚类方法.给出稀疏相似度、等价关系的相似度、广义的等价关系的定义.基于对象间的稀疏相似度和等价关系原理形成初始等价类.通过等价关系的相似度修正初始等价关系.使得最终聚类结果更合理.该算法聚类过程不依赖于输入样本的排列顺序.高维稀疏数据的有效压缩提高算法在维数较高时的执行效率.适合于高维稀疏数据的聚类分析. Clustering analysis is one of the most important techniques in data mining with scale, dimension and sparseness of dataset being three key factors that influence accuracy of clustering . An effective clustering algorithm for the high attribute dimension sparse data is proposed in this paper. Definitions are given, such as sparse similarity, similarity between equivalence relations and generalized equivalence relation. Based on these definitions, the theory of equivalence relation is applied to form initial clusters. Initial equivalence relations are modified in terms of the similarity between two equivalence relations in order to obtain more reasonable clustering results. High dimensional sparse data is effectively compressed and expressed as sparse feature vector whose dimension is far lower than that of original data. As a result, the proposed approach can handle an array of high dimensional sparse data with high efficiency, and be independent of sequence of the objects.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2006年第3期289-294,共6页 Pattern Recognition and Artificial Intelligence
基金 江苏省自然科学基金(No.BK2004137)
关键词 稀疏相似度 等价关系的相似度 数据压缩 聚类 Sparse Similarity, Similarity between Equivalence Relations, Data Compression, Clustering
  • 相关文献

参考文献7

  • 1HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2001.185.
  • 2Bradley P S, Fayyad U M, Reina C. Sealing Clustering Algorithms to Large Databases. In: Proc of the 4th International Conference on Knowledge Discovery and Data Mining, Menlo Park, USA, 1998, 9-15
  • 3Hirane S, Tsumoto S, Okuzaki T, Hata Y, A Clustering Method for Nominal and Numerical Data Based on Rough Set Theory, In: Proc of the International Workshop on Rough Set Theory and Granular Computing. Matsue, Japan, 2001, 211-216
  • 4苗夺谦,王珏.粗糙集理论中概念与运算的信息表示[J].软件学报,1999,10(2):113-116. 被引量:250
  • 5周永权,焦李成.高属性维稀疏数据聚类回归逻辑神经网络模型及学习算法[J].电子学报,2004,32(8):1342-1345. 被引量:3
  • 6安秋生,沈钧毅,王国胤.基于信息粒度与Rough集的聚类方法研究[J].模式识别与人工智能,2003,16(4):412-417. 被引量:18
  • 7Hirano S, Tsumoto S. Dealing with Relatively Proximity by Rough Clustering. ln: Proc of the 22nd International Conference of the North American Fuzzy Information Processing Society. Chicago, USA, 2003,260-265

二级参考文献22

  • 1王珏,袁小红,石纯一,郝继刚.关于知识表示的讨论[J].计算机学报,1995,18(3):212-224. 被引量:54
  • 2王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 3焦李成.神经网络计算[M].西安:电子科技大学出版社,1996..
  • 4苗夺谦.Rough Set理论及其在机器学习中的应用研究(博士学位论文)[M].北京:中国科学院自动化研究所,1997..
  • 5苗夺谦,博士学位论文,1997年
  • 6Zhang T,et al.BIRCH:An efficient data clustering method for very large databases[A].Proc.of the ACM SIGMOD Int'l Conf on Management of Data[C].Montreal:ACM press,1996.73-84.
  • 7Guha S,et al.CURE:An efficient clustering algorithm for large databases[A].Proc.of the ACM SIGMOD Int'l Conf on Management of data[C].Seattle:ACM Press,1998.73-84.
  • 8Guha S,et al.A robust clustering algorithm for categorical attributes[A].Proc.of the 15th IEEE Int'l Conf on data Engineering[C].Sydney,Australia,1999.512-521.
  • 9Ester M,et al.A density-based algorithm for discovering clusters in large spatial database with noise[A].Proc.of 2nd Int'l Conf on KDD'96[C].Portland:AAAI Press,1996.226-231.
  • 10Zhang W,et al.STING:A statistical information grid approach to spatial data mining[A].Proc.of the 23th VLDB Conf[C].Athens:Morgan Kaufmann,1997.186-195.

共引文献280

同被引文献73

  • 1Ai-BoSong,Mao-XianZhao,Zuo-PengLiang,Yi-ShengDong,Jun-ZhouLuo.Discovering User Profiles for Web Personalized Recommendation[J].Journal of Computer Science & Technology,2004,19(3):320-328. 被引量:2
  • 2冯凌,林杰,雷星晖.Web日志数据挖掘模型研究[J].计算机集成制造系统,2005,11(8):1073-1075. 被引量:8
  • 3吴萍,宋瀚涛,牛振东,张利萍,张聚礼.基于SS/OSF实现高维稀疏数据对象的聚类[J].北京理工大学学报,2006,26(3):216-220. 被引量:5
  • 4宋江春,沈钧毅.一种新的Web用户群体和URL聚类算法的研究[J].控制与决策,2007,22(3):284-288. 被引量:11
  • 5Han J,Kamber M.Data mining:concepts and techniques[M].New York:Morgan Kaufmann,2001.
  • 6Beyer K S,Goldstein J,Ramakrishnan R,et al.When is nearest neighbor meaningful?[C] ∥Proceedings of the 7th International Conference on Database.Jerusalem:Springer-Verlag,1999:217-235.
  • 7Hirano S,Tsumoto S,Kuzaki T,et al.A clustering method for nomina1 and numerical data based on rough set theory[C] ∥Proc of the International Workshop on Rough Set Theory and Granular Computing.Matsue:Springer,Berlin,2001:211-216.
  • 8Castellano G,Fanelli A M,Mencar C,et al.Similarity-based fuzzy clustering for user profiling[C] ∥IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops.Washington D C:IEEE Computer Society,2007:75-78.
  • 9Zadeh L A.Some reflections on soft computing,granular computing and their roles in the conception,design and utilization of information/intelligent systems[J].Soft Computing,1998,2(1):23-25.
  • 10Xie Y,Raghavan V V,Dhatric P,et al.A new fuzzy clustering algorithm for optimally finding granular prototypes[J].International Journal of Approximate Reasoning,2005,40(1/2):109-124.

引证文献6

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部