Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative su...Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative subset from massive data. To accelerate the speed of producing representative subset, an improved algorithm-ARSMA(accelerated representative subset mining algorithm) is advanced, which adopt combining putting forward with backward strategies. In this way, the performance of the algorithm is improved. Finally we make experiments on real datasets and evaluate the representative subset. The experiment shows that ARSMA algorithm is more excellent than RandomPick algorithm either on effectiveness or efficiency.展开更多
A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is di...A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.展开更多
基金Supported by the National High Technology Research and Development Program of China (2001AA113182)
文摘Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative subset from massive data. To accelerate the speed of producing representative subset, an improved algorithm-ARSMA(accelerated representative subset mining algorithm) is advanced, which adopt combining putting forward with backward strategies. In this way, the performance of the algorithm is improved. Finally we make experiments on real datasets and evaluate the representative subset. The experiment shows that ARSMA algorithm is more excellent than RandomPick algorithm either on effectiveness or efficiency.
文摘A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.