期刊文献+
共找到24篇文章
< 1 2 >
每页显示 20 50 100
Double-layer Bayesian Classifier Ensembles Based on Frequent Itemsets 被引量:3
1
作者 Wei-Guo Yi Jing Duan Ming-Yu Lu 《International Journal of Automation and computing》 EI 2012年第2期215-220,共6页
Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensembl... Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classifmation error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms. 展开更多
关键词 Double-layer Bayesian CLASSIFIER frequent itemsets conditional mutual information support.
下载PDF
Effect of Count Estimation in Finding Frequent Itemsets over Online Transactional Data Streams 被引量:2
2
作者 JoongHyukChang WonSukLee 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第1期63-69,共7页
A data stream is a massive unbounded sequence of data elements continuouslygenerated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice thecorrectness of their results for fast processing ... A data stream is a massive unbounded sequence of data elements continuouslygenerated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice thecorrectness of their results for fast processing time. The processing time is greatly influenced bythe amount of information that should be maintained. This issue becomes more serious in findingfrequent itemsets or frequency counting over an online transactional data stream since there can bea large number of itemsets to be monitored. We have proposed a method called the estDec method forfinding frequent itemsets over an online data stream. In order to reduce the number of monitoreditemsets in this method, monitoring the count of an itemset is delayed until its support is largeenough to become a frequent itemset in the near future. For this purpose, the count of an itemsetshould be estimated. Consequently, how to estimate the count of an itemset is a critical issue inminimizing memory usage as well as processing time. In this paper, the effects of various countestimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memoryusage and processing time. 展开更多
关键词 count estimation frequent itemsets transactional data streams
原文传递
Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy 被引量:2
3
作者 Daniel Kunkle 张冬晖 Gene Cooperman 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第1期77-102,共26页
This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for a... This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms. 展开更多
关键词 generalized association rules frequent generalized itemsets redundancy avoidance
原文传递
Text Classification Using Sentential Frequent Itemsets
4
作者 刘石竹 胡和平 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第2期334-336,F0003,共4页
Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text class... Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system. 展开更多
关键词 text classification sentential frequent itemsets variable precision rough set model
原文传递
Mining Frequent Itemsets in Correlated Uncertain Databases 被引量:1
5
作者 童咏昕 陈雷 余洁莹 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第4期696-712,共17页
Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most... Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and e?ciency of the proposed model and algorithms. 展开更多
关键词 CORRELATION uncertain data probabilistic frequent itemset
原文传递
Backward Support Computation Method for Positive and Negative Frequent Itemset Mining
6
作者 Mrinmoy Biswas Akash Indrani Mandal Md. Selim Al Mamun 《Journal of Data Analysis and Information Processing》 2023年第1期37-48,共12页
Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on p... Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on positive itemsets generated from frequently occurring itemsets (PFIS). However, there has been a significant study focused on infrequent itemsets with utilization of negative association rules to mine interesting frequent itemsets (NFIS) from transactions. In this work, we propose an efficient backward calculating negative frequent itemset algorithm namely EBC-NFIS for computing backward supports that can extract both positive and negative frequent itemsets synchronously from dataset. EBC-NFIS algorithm is based on popular e-NFIS algorithm that computes supports of negative itemsets from the supports of positive itemsets. The proposed algorithm makes use of previously computed supports from memory to minimize the computation time. In addition, association rules, i.e. positive and negative association rules (PNARs) are generated from discovered frequent itemsets using EBC-NFIS algorithm. The efficiency of the proposed algorithm is verified by several experiments and comparing results with e-NFIS algorithm. The experimental results confirm that the proposed algorithm successfully discovers NFIS and PNARs and runs significantly faster than conventional e-NFIS algorithm. 展开更多
关键词 Data Mining Positive Frequent Itemset Negative Frequent Itemset Association Rule Backward Support
下载PDF
A Fast Distributed Algorithm for Association Rule Mining Based on Binary Coding Mapping Relation
7
作者 CHEN Geng NI Wei-wei +1 位作者 ZHU Yu-quan SUN Zhi-hui 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期27-30,共4页
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ... Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient. 展开更多
关键词 frequent itemsets distributed association rule mining relation of itemsets-binary data
下载PDF
CFSBC: Clustering in High-Dimensional Space Based on Closed Frequent Item Set
8
作者 NIWei-wei SUNZhi-hui 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期590-594,共5页
Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same clus... Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes. Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient. Key words clustering - closed frequent itemsets - association rule - clustering attributes CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: NI Wei-wei (1979-), male, Ph. D candidate, research direction: data mining and knowledge discovery. 展开更多
关键词 CLUSTERING closed frequent itemsets association rule clustering attributes
下载PDF
FICW: Frequent Itemset Based Text Clustering with Window Constraint
9
作者 ZHOU Chong LU Yansheng ZOU Lei HU Rong 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页
Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the s... Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. 展开更多
关键词 text clustering frequent itemsets search engine
下载PDF
A User Selection Method in Advertising System
10
作者 Shiyi XIONG Zhiqing LIN Bo XIAO 《International Journal of Communications, Network and System Sciences》 2010年第1期54-58,共5页
It’s important for mobile operators to recommend new services. Traditional method is sending advertising messages to all mobile users. But most of users who are not interested in these services treat the messages as ... It’s important for mobile operators to recommend new services. Traditional method is sending advertising messages to all mobile users. But most of users who are not interested in these services treat the messages as Spam. This paper presents a method to find potential customers who are likely to accept the services. This method searchs the maximum frequent itemsets which indicate potential customers’ features from a large data set of users’ information, then find potential customers from those maximum frequent itemsets by using a bayesian network classifier. Experimental results demonstrate this method can select users with higher accuracy. 展开更多
关键词 USER Selection MAXIMUM Frequent itemsets BAYESIAN Network
下载PDF
Frequent Itemset Mining of User’s Multi-Attribute under Local Differential Privacy 被引量:2
11
作者 Haijiang Liu Lianwei Cui +1 位作者 Xuebin Ma Celimuge Wu 《Computers, Materials & Continua》 SCIE EI 2020年第10期369-385,共17页
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ... Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets. 展开更多
关键词 Local differential privacy frequent itemset mining user’s multi-attribute
下载PDF
Mining φ-Frequent Itemset Using FP-Tree
12
作者 李天瑞 《Journal of Modern Transportation》 2001年第1期67-74,共8页
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of... The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases. 展开更多
关键词 data processing DATABASES φ association rule mining φ frequent itemset FP tree data mining
下载PDF
Incrementally Exploiting Sentential Association for Email Classification
13
作者 李曲 何玉 +1 位作者 冯剑琳 冯玉才 《Journal of Southwest Jiaotong University(English Edition)》 2006年第2期129-134,共6页
A novel association-based algorithm EmailinClass is proposed for incremental Email classification. In view of the fact that the basic semantic unit in an Email is actually a sentence, and the words within the same sen... A novel association-based algorithm EmailinClass is proposed for incremental Email classification. In view of the fact that the basic semantic unit in an Email is actually a sentence, and the words within the same sentence are typically more semantically related than the words that just appear in the same Email, EmailInClass views a sentence rather than an Email as a transaction. Extensive experiments conducted on benchmark corpora Enron reveal that the effectiveness of EmallInClass is superior to the non-incremental alternatives such as NalveBayes and SAT-MOD. In addition, the classification rules generated by EroaillnClass are human readable and revisable, 展开更多
关键词 Document Requent itemset Category frequent itemset MODFIT heuristic Category prefix-tree Incremental classification
下载PDF
FPGA-Based Stream Processing for Frequent Itemset Mining with Incremental Multiple Hashes
14
作者 Kasho Yamamoto Masayuki Ikebe +1 位作者 Tetsuya Asai Masato Motomura 《Circuits and Systems》 2016年第10期3299-3309,共11页
With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time... With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version. 展开更多
关键词 Data Mining Frequent Itemset Mining FPGA Stream Processing
下载PDF
Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy 被引量:1
15
作者 赵丹 赵素云 +3 位作者 陈红 刘睿瑄 李翠平 张晓莹 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1403-1422,共20页
Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"... Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items.The current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate results.Thus,an unbiased FIM approach with lower variance is highly promising.To narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the vector.An FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies.Finally,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee. 展开更多
关键词 local differential privacy frequent itemset mining frequency oracle
原文传递
A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop
16
作者 Zaihe Cheng Wei Shen +1 位作者 Wei Fang Jerry Chun-Wei Lin 《Complex System Modeling and Simulation》 2023年第1期47-58,共12页
High-utility itemset mining(HUIM)can consider not only the profit factor but also the profitable factor,which is an essential task in data mining.However,most HUIM algorithms are mainly developed on a single machine,w... High-utility itemset mining(HUIM)can consider not only the profit factor but also the profitable factor,which is an essential task in data mining.However,most HUIM algorithms are mainly developed on a single machine,which is inefficient for big data since limited memory and processing capacities are available.A parallel efficient high-utility itemset mining(P-EFIM)algorithm is proposed based on the Hadoop platform to solve this problem in this paper.In P-EFIM,the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework.Then the ordered itemsets are renumbered,and the low-utility itemsets are pruned to improve the dataset utility.In the Map phase,the P-EFIM algorithm divides the task into multiple independent subtasks.It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing.Furthermore,the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase.Experiments are performed on eight datasets,and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth,which is also HUIM algorithm based on the Hadoop framework. 展开更多
关键词 pattern mining data mining HADOOP PARALLEL high-utility itemset mining big data
原文传递
CONCISE REPRESENTATIONS FOR ASSOCIATION RULES IN MULTI-LEVEL DATASETS
17
作者 Gavin SHAW 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2009年第1期53-70,共18页
Association rule mining plays an important role in knowledge and information discovery. Often for a dataset, a huge number of rules can be extracted, but many of them are redundant, especially in the case of multi-lev... Association rule mining plays an important role in knowledge and information discovery. Often for a dataset, a huge number of rules can be extracted, but many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant rules is a promising approach to solve this problem. However, existing work (Pasquier et al. 2005, Xu & Li 2007) is only focused on single level datasets. In this paper, we firstly present a definition for redundancy and a concise representation called Reliable basis for representing non-redundant association rules, then we propose an extension to the previous work that can remove hierarchically redundant rules from multi-level datasets. We also show that the resulting concise representation of non-redundant association rules is lossless since all association rules can be derived from the representation. Experiments show that our extension can effectively generate multilevel non-redundant rules. 展开更多
关键词 Association rule mining redundant association rules closed itemsets multi-level datasets
原文传递
A Fast Algorithm for Mining Association Rules 被引量:17
18
作者 黄刘生 陈华平 +1 位作者 王洵 陈国良 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第6期619-624,共6页
In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally ... In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions. 展开更多
关键词 DATABASE data mining large itemset association rule minimum support minimum confidence
原文传递
CLS-Miner: efficient and effective closed high-utility itemset mining 被引量:10
19
作者 Thu-Lan DAM Kenli LI +1 位作者 Philippe FOURNIER-VIGER Quang-Huy DUONG 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第2期357-381,共25页
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result,... High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithm outperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly. 展开更多
关键词 UTILITY MINING high-utility ITEMSET MINING CLOSED ITEMSET MINING CLOSED high-utility ITEMSET MINING
原文传递
Parallel Incremental Frequent Itemset Mining for Large Data 被引量:5
20
作者 Yu-Geng Song Hui-Min Cui Xiao-Bing Feng 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第2期368-385,共18页
Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related sea... Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it. 展开更多
关键词 incremental parallel FPGrowth data mining frequent itemset mining MAPREDUCE
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部