A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR...A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.展开更多
[Objective] This comparative experiment was to explore the soil loss con- trol effects under cultivation combination of different soil and vegetation types, and to provide scientific basis for the upcoming pilot proje...[Objective] This comparative experiment was to explore the soil loss con- trol effects under cultivation combination of different soil and vegetation types, and to provide scientific basis for the upcoming pilot project of ecological recovery. [Method] Both the rudiment of water locomotion functioned by micro-landscape structures and different spatial combinations of various landscape constituents are considered, thus, the combination of multi-soil type, crop species and site conditions is designed in three different experimental sites. [Result] Soil loss estimates in experiments in South Wello significantly depended on various soil type, slope, vegetation and type of con- servation structure; grass cover tremendously reduces soil loss; legume cultivation performed better than cereal cultivation in soil loss control. [Conclusion] By conduct- ing the data analysis of the experiment, a scientific reference is proposed to the agri- culture planting and protective mode for the alleviation of water and soil loss in Amhara Region, Ethiopia.展开更多
The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete p...The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete picture of data flaw and transaction,this paper presents a systematic overview of the flow and transaction of personal,corporate and public data on the basis of data factor classification from various perspectives.By utilizing various sources of information,this paper estimates the volume of data generation&storage and the volume&trend of data market transactions for major economies in the world with the following findings:(i)Data classification is diverse due to a broad variety of applying scenarios,and data transaction and profit distribution are complex due to heterogenous entities,ownerships,information density and other attributes of different data types.(ii)Global data transaction has presented with the characteristics of productization,servitization and platform-based mode.(iii)For major economies,there is a commonly observed disequilibrium between data generation scale and storage scale,which is particularly striking for China.(i^v)The global data market is in a nascent stage of rapid development with a transaction volume of about 100 billion US dollars,and China s data market is even more underdeveloped and only accounts for some 10%of the world total.All sectors of the society should be flly aware of the diversity and complexity of data factor classification and data transactions,as well as the arduous and long-term nature of developing and improving relevant institutional systems.Adapting to such features,efforts should be made to improve data classification,enhance computing infrastructure development,foster professional data transaction and development institutions,and perfect the data governance system.展开更多
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside...The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.展开更多
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi...Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.展开更多
Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label ...Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label field which entails labeling the packets belonging to the same flow, packet classification can be resolved based on 3 dimensions: flow label, source address and desti- nation address. In this paper, we propose a novel approach for the 3-tuple packet classification based on flow label. Besides, by introducing a conversion engine to covert the source-destination pairs to the compound address prefixes, we put forward an algorithm called Reducing Dimension (RD) with dimension reduction capability, which combines heuristic tree search with usage of buck- ets. And we also provide an improved version of RD, called Improved RD (IRD), which uses two mechanisms: path compression and priority tag, to optimize the perforrmnce. To evaluate our algo- rithm, extensive experiraents have been conducted using a number of synthetically generated databas- es. For the memory consumption, the two pro- posed new algorithms only consumes around 3% of the existing algorithms when the number of ill- ters increases to 10 k. And for the average search time, the search time of the two proposed algo- rithms is more than four times faster than others when the number of filters is 10 k. The results show that the proposed algorithm works well and outperforms rmny typical existing algorithms with the dimension reduction capability.展开更多
Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyz...Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.展开更多
基金The National Natural Science Foundation of China(No.60673060)the Natural Science Foundation of Jiangsu Province(No.BK2005047)
文摘A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.
基金Supported by FAO of the United Nations under South-South Cooperation Program in Ethiopia(SSC/SPFS-FAO-ETHIOPIA-CHINA)~~
文摘[Objective] This comparative experiment was to explore the soil loss con- trol effects under cultivation combination of different soil and vegetation types, and to provide scientific basis for the upcoming pilot project of ecological recovery. [Method] Both the rudiment of water locomotion functioned by micro-landscape structures and different spatial combinations of various landscape constituents are considered, thus, the combination of multi-soil type, crop species and site conditions is designed in three different experimental sites. [Result] Soil loss estimates in experiments in South Wello significantly depended on various soil type, slope, vegetation and type of con- servation structure; grass cover tremendously reduces soil loss; legume cultivation performed better than cereal cultivation in soil loss control. [Conclusion] By conduct- ing the data analysis of the experiment, a scientific reference is proposed to the agri- culture planting and protective mode for the alleviation of water and soil loss in Amhara Region, Ethiopia.
文摘The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete picture of data flaw and transaction,this paper presents a systematic overview of the flow and transaction of personal,corporate and public data on the basis of data factor classification from various perspectives.By utilizing various sources of information,this paper estimates the volume of data generation&storage and the volume&trend of data market transactions for major economies in the world with the following findings:(i)Data classification is diverse due to a broad variety of applying scenarios,and data transaction and profit distribution are complex due to heterogenous entities,ownerships,information density and other attributes of different data types.(ii)Global data transaction has presented with the characteristics of productization,servitization and platform-based mode.(iii)For major economies,there is a commonly observed disequilibrium between data generation scale and storage scale,which is particularly striking for China.(i^v)The global data market is in a nascent stage of rapid development with a transaction volume of about 100 billion US dollars,and China s data market is even more underdeveloped and only accounts for some 10%of the world total.All sectors of the society should be flly aware of the diversity and complexity of data factor classification and data transactions,as well as the arduous and long-term nature of developing and improving relevant institutional systems.Adapting to such features,efforts should be made to improve data classification,enhance computing infrastructure development,foster professional data transaction and development institutions,and perfect the data governance system.
基金supported by proposal No.OSD/BCUD/392/197 Board of Colleges and University Development,Savitribai Phule Pune University,Pune
文摘The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.
文摘Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.
基金This paper was supported by the National Natural Science Foundation of China under Crant No. 61003282 the Funda- mental Research Funds for the Central Universities under Crant No. 2011RCI)508+1 种基金 National Basic Research Program of China under Crant No. 2009CB320505 National High Technol-ogy Research and Development Program of China under Oant No. 2011AA010704.
文摘Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label field which entails labeling the packets belonging to the same flow, packet classification can be resolved based on 3 dimensions: flow label, source address and desti- nation address. In this paper, we propose a novel approach for the 3-tuple packet classification based on flow label. Besides, by introducing a conversion engine to covert the source-destination pairs to the compound address prefixes, we put forward an algorithm called Reducing Dimension (RD) with dimension reduction capability, which combines heuristic tree search with usage of buck- ets. And we also provide an improved version of RD, called Improved RD (IRD), which uses two mechanisms: path compression and priority tag, to optimize the perforrmnce. To evaluate our algo- rithm, extensive experiraents have been conducted using a number of synthetically generated databas- es. For the memory consumption, the two pro- posed new algorithms only consumes around 3% of the existing algorithms when the number of ill- ters increases to 10 k. And for the average search time, the search time of the two proposed algo- rithms is more than four times faster than others when the number of filters is 10 k. The results show that the proposed algorithm works well and outperforms rmny typical existing algorithms with the dimension reduction capability.
基金Foundation item: supported by Intel Corporation (No. 9078)
文摘Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.