DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
Reduced Q-matrix (Qr matrix) plays an important role in the rule space model (RSM) and the attribute hierarchy method (AHM). Based on the attribute hierarchy, a valid/invalid item is defined. The judgment method...Reduced Q-matrix (Qr matrix) plays an important role in the rule space model (RSM) and the attribute hierarchy method (AHM). Based on the attribute hierarchy, a valid/invalid item is defined. The judgment method of the valid/invalid item is developed on the relation between reachability matrix and valid items. And valid items are explained from the perspective of graph theory. An incremental augment algorithm for constructing Qr matrix is proposed based on the idea of incremental forward regression, and its validity is theoretically considered. Results of empirical tests are given in order to compare the performance of the incremental augment algo-rithm and the Tatsuoka algorithm upon the running time. Empirical evidence shows that the algorithm outper-forms the Tatsuoka algorithm, and the analysis of the two algorithms also show linear growth with respect to the number of valid items. Mathematical models with 10 attributes are built for the two algorithms by the linear regression analysis.展开更多
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.
基金Supported by the National Natural Science Foundation of China (30860084,60673014,60263005)the Backbone Young Teachers Foundation of Fujian Normal University(2008100244)the Department of Education Foundation of Fujian Province (ZA09047)~~
文摘Reduced Q-matrix (Qr matrix) plays an important role in the rule space model (RSM) and the attribute hierarchy method (AHM). Based on the attribute hierarchy, a valid/invalid item is defined. The judgment method of the valid/invalid item is developed on the relation between reachability matrix and valid items. And valid items are explained from the perspective of graph theory. An incremental augment algorithm for constructing Qr matrix is proposed based on the idea of incremental forward regression, and its validity is theoretically considered. Results of empirical tests are given in order to compare the performance of the incremental augment algo-rithm and the Tatsuoka algorithm upon the running time. Empirical evidence shows that the algorithm outper-forms the Tatsuoka algorithm, and the analysis of the two algorithms also show linear growth with respect to the number of valid items. Mathematical models with 10 attributes are built for the two algorithms by the linear regression analysis.