A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial partic...A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.展开更多
Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidat...Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.展开更多
This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the compu...This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.展开更多
Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an...Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an unknown format. It is certain that data can be cut into packets because there are usually characteristic bit sequences in packet headers. The key to solve the problem is how to find out those characteristic sequences. We present an efficient way of bit sequence enumeration. Both Aho-Corasick (AC) algorithm and data mining method are used to reduce the cost of the process.展开更多
文摘A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.
基金theFundoftheNationalManagementBureauofTraditionalChineseMedicine(No .2 0 0 0 J P 5 4 )
文摘Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.
文摘This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.
文摘Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an unknown format. It is certain that data can be cut into packets because there are usually characteristic bit sequences in packet headers. The key to solve the problem is how to find out those characteristic sequences. We present an efficient way of bit sequence enumeration. Both Aho-Corasick (AC) algorithm and data mining method are used to reduce the cost of the process.