Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ...Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.展开更多
Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset...Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.展开更多
基金the National Natural Science Foundation of China (Nos. 60533090 and 60603096)the National Hi-Tech Research and Development Program (863) of China (No. 2006AA010107)+2 种基金the Key Technology R&D Program of China (No. 2006BAH02A13-4)the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652)the Cultivation Fund of the Key Scientific and Technical Innovation Project of MOE, China (No. 706033)
文摘Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.
基金supported by the National Natural Science Foundation of China (60603098)
文摘Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.