Compression is an intuitive way to boost the performance of a database system. However, compared with other physical database design techniques, compression consumes large amount of CPU power. There is a trade-off bet...Compression is an intuitive way to boost the performance of a database system. However, compared with other physical database design techniques, compression consumes large amount of CPU power. There is a trade-off between the re- duction of disk access and the overhead of CPU processing. Automatic design and adaptive administration of database systems are widely demanded, and the automatic selection of compression schema to compromise the trade-off is very important. In this paper, we present a model with novel techniques to integrate a rapidly convergent agent-based evolution framework, i.e. the SWAF (SWarm Algorithm Framework), into adaptive attribute compression for relational database. The model evolutionally consults statistics of CPU load and IO bandwidth to select compression schemas considering both aspects of the trade-off. We have im- plemented a prototype model on Oscar RDBMS with experiments highlighting the correctness and efficiency of our techniques.展开更多
For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on...For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on a novel method to select the margin point between two clusters for in-ter-cluster similarity more accurately,and provides an improved scatter function for intra-cluster similarity.Simulation results show the effectiveness of the proposed index on the data sets under consideration regardless of the choice of a clustering algorithm.展开更多
A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial partic...A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.展开更多
The effect of carrier phase error in digital image transmission system is discussed. Code error rate and SNR with carrier phase error in Gauss white noise channel are calculated, and the transmission system is simulat...The effect of carrier phase error in digital image transmission system is discussed. Code error rate and SNR with carrier phase error in Gauss white noise channel are calculated, and the transmission system is simulated on computer.展开更多
Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from th...Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from the viewpoint of cluster ensemble, and apply cluster ensemble approach for clustering categorical data. Experimental results on real datasets show that better clustering accuracy can be obtained by comparing with existing categorical data clustering algorithms.展开更多
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regressi...This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regression. The proposed method not only keeps the merit of considering both response and predictors' information, but also enhances the capability of handling highly correlated variables. It is justified under certain linearity conditions. An empirical application on a macroeconomic data set shows that the proposed method has outperformed the dynamic factor model and other shrinkage methods.展开更多
基金Project (No. 2004AA4Z3010) supported by the National Hi-Tech Research and Development Program (863) of China
文摘Compression is an intuitive way to boost the performance of a database system. However, compared with other physical database design techniques, compression consumes large amount of CPU power. There is a trade-off between the re- duction of disk access and the overhead of CPU processing. Automatic design and adaptive administration of database systems are widely demanded, and the automatic selection of compression schema to compromise the trade-off is very important. In this paper, we present a model with novel techniques to integrate a rapidly convergent agent-based evolution framework, i.e. the SWAF (SWarm Algorithm Framework), into adaptive attribute compression for relational database. The model evolutionally consults statistics of CPU load and IO bandwidth to select compression schemas considering both aspects of the trade-off. We have im- plemented a prototype model on Oscar RDBMS with experiments highlighting the correctness and efficiency of our techniques.
文摘For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on a novel method to select the margin point between two clusters for in-ter-cluster similarity more accurately,and provides an improved scatter function for intra-cluster similarity.Simulation results show the effectiveness of the proposed index on the data sets under consideration regardless of the choice of a clustering algorithm.
文摘A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.
基金The Overseas Chinese Affairs Office of the State Council!(No. 93A114)
文摘The effect of carrier phase error in digital image transmission system is discussed. Code error rate and SNR with carrier phase error in Gauss white noise channel are calculated, and the transmission system is simulated on computer.
文摘Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from the viewpoint of cluster ensemble, and apply cluster ensemble approach for clustering categorical data. Experimental results on real datasets show that better clustering accuracy can be obtained by comparing with existing categorical data clustering algorithms.
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.
基金supported by the National Science Foundation of China under Grant No.71101030the Program for Innovative Research Team in UIBE under Grant No.CXTD4-01
文摘This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regression. The proposed method not only keeps the merit of considering both response and predictors' information, but also enhances the capability of handling highly correlated variables. It is justified under certain linearity conditions. An empirical application on a macroeconomic data set shows that the proposed method has outperformed the dynamic factor model and other shrinkage methods.