
相似度计算及其在数据挖掘中的应用 被引量:4

The Calculation of Similarity and its Application in Data Mining
摘要 相似度是描述两个对象之间相似程度的一种度量,依据对象不同,相似度计算方法亦不同。相似度计算被广泛应用于数据挖掘算法中,它是对象分类的基础。该文将数据对象划分为数值型、非数值型和混合型三种,并根据数据对象的类型,探讨了相应的相似度计算方法,最后,通过实例描述了相似度计算在数据挖掘中的应用。 The Similarity is a measure of similarity between two objects, according to different objects, similarity calculation method is also different. Similarity calculation is widely used in data classification, is the basis for object classification. In this paper, the data objects were divided into three kinds: numeric type, non-numeric type and mixed type. And the similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data mining.
作者 李俊磊 滕少华 LI Jun-lei, TENG Shao-hua (1.Guangdong Justice Police Vocational College, Guangzhou 510520, China;2. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China)
出处 《电脑知识与技术》 2016年第5期14-17,共4页 Computer Knowledge and Technology
基金 教育部重点实验室基金项目(110411) 广东省自然科学基金资助项目(10451009001004804,9151009001000007) 广东省科技计划项目(2012B091000173) 广州市科技计划项目(2012J5100054)资助
关键词 对象 相似度计算 数据挖掘 数据类型 object similarity calculation data mining data type
  • 相关文献


  • 1Jiawei Han, Micheline Kamber, Jian Pei.Data Mining Con- cepts and Technologyes[M].3rd ed.China Machine Press,2012.
  • 2黄或.相似度度量的研究及其在数据挖掘中的应用[D].福州:福建师范大学,2009.
  • 3Yano Y.Associative Memory with Fully Parallel Nearest-Man- hattan- Distance Search for Low-Power Real-Time Single- Chip Applications[C]. Proc. Of IEEE ASP-DAC, 2004:543- 544.
  • 4Hua-Kai Chiou, Gia-Shie Liu.Muhiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnol- ogy in Taiwan[C]. Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing,2009:172-177.
  • 5de Souza R M C R , de Carvalho F A T. Dynamic clustering of interval data based on adaptive Chebyshev distances[J]. Electronics Letters, 2004, 40(11).
  • 6Ryotaro Kamimura, Osamu Uchida. Greedy Network-Growing by Minkowski Distance Functions[C]. IEEE Transaction on Neural Networks, 2004:2837-2842.
  • 7Chunhua Shen, Junae Kim, Lei Wang. Scalable Large-Margin Mahalanobis Distance Metric Learuing[J].IEEE Transactions on Neural Networks, 2010, 21( 9): 1524-1530.
  • 8Sheng-Yijiang.Efficient Classification Method for Large Datas- et [C]. Proceeding of the Fifth International Conference on Ma- chine Learning and Cybernetics, Dalian, 2006:13-16.
  • 9Xing E P, Ng A Y, Jordan M I,et al. Distance metric learn- ing, with application to clustering with side-information[C]. proc Adv Neural Inf Process Sys., 2003:505-512.
  • 10陈群.基于划分的混合属性聚类算法研究[D].长沙:湖南大学,2010.


  • 1Jiawei Han,Micheline Kamber. Data Mining: Concepts and Techniques 97-116[M].Morgan Kaufmann Publishers,2000
  • 2Jiawei Han,Yongjian Fu,Wei Wang et al. DMQL:A Data Mining Query Language for Relational Database[C].In :VLDB′96,1996
  • 3胡冠章.应用近世代数[M].北京:清华大学出版社,2002
  • 4S. Santini, R. Jain. Similarity Measures[ J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1999,21 (9) :871 -883.
  • 5Y.S. Son,J. Baek. A modified correlation coefficient based similarity measure for clustering time-course gene expression data[ J]. Pattern Recognition Letters,2008,29 (3) :232 - 242.
  • 6L. Bodis,A. Ross,E. Pretsch. A novel spectra similarity measure[J]. Chemometrics and Intelligent Laboratory Systems,2007,85(1) :1 -8.
  • 7J. Gower. A general coefficient of similarity and some of its properties [ J ]. Biometrics, 1971,27 (4) : 857 - 874.
  • 8M. Ichino, H. Yaguchi. Generalized Minkowski metrics for mixed feature-type data analysis [ J ]. IEEE Transactions on System, Man and Cybernetics, 1994,24 (4) : 698 - 708.
  • 9L. Kaufman, P. Rousseeuw. Finding Groups in Data-An Introduction to Cluster Analysis[ M ]. New York:John Wiley & Sons,Inc, 1990.
  • 10Anderberg M. Cluster analysis for application [ M ]. New York : Academic Press, 1973.












使用帮助 返回顶部