Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ...Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.展开更多
A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns r...A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns revealed from their mobile phones,researchers and practitioners are now equipped to derive the largest trip samples for a region.Because of its ubiquitous use,extensive coverage of telecommunication services and high penetration rates,travel demand can be studied continuously in fine spatial and temporal resolutions.The derived sample or seed trip matrices are coupled with surveyed commute flow data and prevalent travel demand modeling techniques to provide estimates of the total regional travel demand in the form of origindestination(OD)matrices.The methodology is evaluated in a series of real world transportation planning studies and proved its potentials in application areas such as dynamic traffic assignment modeling,integrated corridor management and online traffic simulations.展开更多
Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset...Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.展开更多
基金the National Natural Science Foundation of China (Nos. 60533090 and 60603096)the National Hi-Tech Research and Development Program (863) of China (No. 2006AA010107)+2 种基金the Key Technology R&D Program of China (No. 2006BAH02A13-4)the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652)the Cultivation Fund of the Key Scientific and Technical Innovation Project of MOE, China (No. 706033)
文摘Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.
文摘A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns revealed from their mobile phones,researchers and practitioners are now equipped to derive the largest trip samples for a region.Because of its ubiquitous use,extensive coverage of telecommunication services and high penetration rates,travel demand can be studied continuously in fine spatial and temporal resolutions.The derived sample or seed trip matrices are coupled with surveyed commute flow data and prevalent travel demand modeling techniques to provide estimates of the total regional travel demand in the form of origindestination(OD)matrices.The methodology is evaluated in a series of real world transportation planning studies and proved its potentials in application areas such as dynamic traffic assignment modeling,integrated corridor management and online traffic simulations.
基金supported by the National Natural Science Foundation of China (60603098)
文摘Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.