期刊文献+
共找到1,322篇文章
< 1 2 67 >
每页显示 20 50 100
PHUI-GA: GPU-based efficiency evolutionary algorithm for mining high utility itemsets
1
作者 JIANG Haipeng WU Guoqing +3 位作者 SUN Mengdan LI Feng SUN Yunfei FANG Wei 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期965-975,共11页
Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining perform... Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach. 展开更多
关键词 high utility itemset mining(HUIM) graphics process-ing unit(GPU)parallel genetic algorithm(GA) mining perfor-mance
下载PDF
New algorithm of mining frequent closed itemsets
2
作者 张亮 任永功 付玉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期335-338,共4页
A new algorithm based on an FC-tree (frequent closed pattern tree) and a max-FCIA (maximal frequent closed itemsets algorithm) is presented, which is used to mine the frequent closed itemsets for solving memory an... A new algorithm based on an FC-tree (frequent closed pattern tree) and a max-FCIA (maximal frequent closed itemsets algorithm) is presented, which is used to mine the frequent closed itemsets for solving memory and time consuming problems. This algorithm maps the transaction database by using a Hash table,gets the support of all frequent itemsets through operating the Hash table and forms a lexicographic subset tree including the frequent itemsets.Efficient pruning methods are used to get the FC-tree including all the minimum frequent closed itemsets through processing the lexicographic subset tree.Finally,frequent closed itemsets are generated from minimum frequent closed itemsets.The experimental results show that the mapping transaction database is introduced in the algorithm to reduce time consumption and to improve the efficiency of the program.Furthermore,the effective pruning strategy restrains the number of candidates,which saves space.The results show that the algorithm is effective. 展开更多
关键词 frequent itemsets frequent closed itemsets minimum frequent closed itemsets maximal frequent closed itemsets frequent closed pattern tree
下载PDF
Backward Support Computation Method for Positive and Negative Frequent Itemset Mining
3
作者 Mrinmoy Biswas Akash Indrani Mandal Md. Selim Al Mamun 《Journal of Data Analysis and Information Processing》 2023年第1期37-48,共12页
Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on p... Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on positive itemsets generated from frequently occurring itemsets (PFIS). However, there has been a significant study focused on infrequent itemsets with utilization of negative association rules to mine interesting frequent itemsets (NFIS) from transactions. In this work, we propose an efficient backward calculating negative frequent itemset algorithm namely EBC-NFIS for computing backward supports that can extract both positive and negative frequent itemsets synchronously from dataset. EBC-NFIS algorithm is based on popular e-NFIS algorithm that computes supports of negative itemsets from the supports of positive itemsets. The proposed algorithm makes use of previously computed supports from memory to minimize the computation time. In addition, association rules, i.e. positive and negative association rules (PNARs) are generated from discovered frequent itemsets using EBC-NFIS algorithm. The efficiency of the proposed algorithm is verified by several experiments and comparing results with e-NFIS algorithm. The experimental results confirm that the proposed algorithm successfully discovers NFIS and PNARs and runs significantly faster than conventional e-NFIS algorithm. 展开更多
关键词 Data Mining Positive Frequent itemset Negative Frequent itemset Association Rule Backward Support
下载PDF
A novel algorithm for frequent itemset mining in data warehouses 被引量:2
4
作者 徐利军 谢康林 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第2期216-224,共9页
Current technology for frequent itemset mining mostly applies to the data stored in a single transaction database. This paper presents a novel algorithm MultiClose for frequent itemset mining in data warehouses. Multi... Current technology for frequent itemset mining mostly applies to the data stored in a single transaction database. This paper presents a novel algorithm MultiClose for frequent itemset mining in data warehouses. MultiClose respectively computes the results in single dimension tables and merges the results with a very efficient approach. Close itemsets technique is used to improve the performance of the algorithm. The authors propose an efficient implementation for star schemas in which their al- gorithm outperforms state-of-the-art single-table algorithms. 展开更多
关键词 Frequent itemset Close itemset Star schema Dimension table Fact table
下载PDF
Double-layer Bayesian Classifier Ensembles Based on Frequent Itemsets 被引量:3
5
作者 Wei-Guo Yi Jing Duan Ming-Yu Lu 《International Journal of Automation and computing》 EI 2012年第2期215-220,共6页
Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensembl... Numerous models have been proposed to reduce the classification error of Naive Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of reducing the classifmation error of the classifier, this paper proposes a double-layer Bayesian classifier ensembles (DLBCE) algorithm based on frequent itemsets. DLBCE constructs a double-layer Bayesian classifier (DLBC) for each frequent itemset the new instance contained and finally ensembles all the classifiers by assigning different weight to different classifier according to the conditional mutual information. The experimental results show that the proposed algorithm outperforms other outstanding algorithms. 展开更多
关键词 Double-layer Bayesian CLASSIFIER frequent itemsets conditional mutual information support.
下载PDF
Frequent Itemset Mining of User’s Multi-Attribute under Local Differential Privacy 被引量:2
6
作者 Haijiang Liu Lianwei Cui +1 位作者 Xuebin Ma Celimuge Wu 《Computers, Materials & Continua》 SCIE EI 2020年第10期369-385,共17页
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ... Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets. 展开更多
关键词 Local differential privacy frequent itemset mining user’s multi-attribute
下载PDF
FICW: Frequent Itemset Based Text Clustering with Window Constraint
7
作者 ZHOU Chong LU Yansheng ZOU Lei HU Rong 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页
Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the s... Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. 展开更多
关键词 text clustering frequent itemsets search engine
下载PDF
Mining φ-Frequent Itemset Using FP-Tree
8
作者 李天瑞 《Journal of Modern Transportation》 2001年第1期67-74,共8页
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of... The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases. 展开更多
关键词 data processing DATABASES φ association rule mining φ frequent itemset FP tree data mining
下载PDF
A Depth-first Algorithm of Finding All Association Rules Generated by a Frequent Itemset
9
作者 武坤 姜保庆 魏庆 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期1-4,9,共5页
The classical algorithm of finding association rules generated by a frequent itemset has to generate all non-empty subsets of the frequent itemset as candidate set of consequents. Xiongfei Li aimed at this and propose... The classical algorithm of finding association rules generated by a frequent itemset has to generate all non-empty subsets of the frequent itemset as candidate set of consequents. Xiongfei Li aimed at this and proposed an improved algorithm. The algorithm finds all consequents layer by layer, so it is breadth-first. In this paper, we propose a new algorithm Generate Rules by using Set-Enumeration Tree (GRSET) which uses the structure of Set-Enumeration Tree and depth-first method to find all consequents of the association rules one by one and get all association rules correspond to the consequents. Experiments show GRSET algorithm to be practicable and efficient. 展开更多
关键词 association rule frequent itemset breath-first depth-first consequent.
下载PDF
Mining Frequent Closed Itemsets in Large High Dimensional Data
10
作者 余光柱 曾宪辉 邵世煌 《Journal of Donghua University(English Edition)》 EI CAS 2008年第4期416-424,共9页
Large high-dimensional data have posed great challenges to existing algorithms for frequent itemsets mining.To solve the problem,a hybrid method,consisting of a novel row enumeration algorithm and a column enumeration... Large high-dimensional data have posed great challenges to existing algorithms for frequent itemsets mining.To solve the problem,a hybrid method,consisting of a novel row enumeration algorithm and a column enumeration algorithm,is proposed.The intention of the hybrid method is to decompose the mining task into two subtasks and then choose appropriate algorithms to solve them respectively.The novel algorithm,i.e.,Inter-transaction is based on the characteristic that there are few common items between or among long transactions.In addition,an optimization technique is adopted to improve the performance of the intersection of bit-vectors.Experiments on synthetic data show that our method achieves high performance in large high-dimensional data. 展开更多
关键词 frequent closed itemsets large highdimensional data row enumeration column enumeration hybrid method
下载PDF
FPGA-Based Stream Processing for Frequent Itemset Mining with Incremental Multiple Hashes
11
作者 Kasho Yamamoto Masayuki Ikebe +1 位作者 Tetsuya Asai Masato Motomura 《Circuits and Systems》 2016年第10期3299-3309,共11页
With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time... With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version. 展开更多
关键词 Data Mining Frequent itemset Mining FPGA Stream Processing
下载PDF
Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy 被引量:1
12
作者 赵丹 赵素云 +3 位作者 陈红 刘睿瑄 李翠平 张晓莹 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1403-1422,共20页
Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"... Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items.The current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate results.Thus,an unbiased FIM approach with lower variance is highly promising.To narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the vector.An FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies.Finally,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee. 展开更多
关键词 local differential privacy frequent itemset mining frequency oracle
原文传递
A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop 被引量:1
13
作者 Zaihe Cheng Wei Shen +1 位作者 Wei Fang Jerry Chun-Wei Lin 《Complex System Modeling and Simulation》 2023年第1期47-58,共12页
High-utility itemset mining(HUIM)can consider not only the profit factor but also the profitable factor,which is an essential task in data mining.However,most HUIM algorithms are mainly developed on a single machine,w... High-utility itemset mining(HUIM)can consider not only the profit factor but also the profitable factor,which is an essential task in data mining.However,most HUIM algorithms are mainly developed on a single machine,which is inefficient for big data since limited memory and processing capacities are available.A parallel efficient high-utility itemset mining(P-EFIM)algorithm is proposed based on the Hadoop platform to solve this problem in this paper.In P-EFIM,the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework.Then the ordered itemsets are renumbered,and the low-utility itemsets are pruned to improve the dataset utility.In the Map phase,the P-EFIM algorithm divides the task into multiple independent subtasks.It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing.Furthermore,the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase.Experiments are performed on eight datasets,and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth,which is also HUIM algorithm based on the Hadoop framework. 展开更多
关键词 pattern mining data mining HADOOP PARALLEL high-utility itemset mining big data
原文传递
基于滑动窗口含负项的高效用模式挖掘
14
作者 武妍 荀亚玲 马煜 《计算机工程与设计》 北大核心 2024年第3期845-851,共7页
针对传统高效用模式挖掘均未考虑项的效用值为负,以及对流数据处理的时效性问题,提出一种基于滑动窗口的高效用挖掘算法HUPN_SW。利用一种新定义的滑动窗口正负效用列表PNSWU-List,维护挖掘最近批次高效用模式集所需的所有信息,实现有... 针对传统高效用模式挖掘均未考虑项的效用值为负,以及对流数据处理的时效性问题,提出一种基于滑动窗口的高效用挖掘算法HUPN_SW。利用一种新定义的滑动窗口正负效用列表PNSWU-List,维护挖掘最近批次高效用模式集所需的所有信息,实现有效的逐批次挖掘,避免重复的数据库扫描,在不产生候选效用模式集的情况下,直接挖掘出高效用模式,使HUPN_SW有效适应于动态流数据。实验结果表明,HUPN_SW算法在运行时间和可扩展性方面有良好表现。 展开更多
关键词 频繁模式挖掘 滑动窗口 高效用模式挖掘 高效用项集 负效用 流数据 效用列表
下载PDF
A Quarterly High RFM Mining Algorithm for Big Data Management
15
作者 Cuiwei Peng Jiahui Chen +1 位作者 Shicheng Wan Guotao Xu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4341-4360,共20页
In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf ava... In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf availability is an essential indicator of shelf data management and closely relates to customer purchase behavior.RFM(recency,frequency,andmonetary)patternmining is a powerful tool to evaluate the value of customer behavior.However,the existing RFM patternmining algorithms do not consider the quarterly nature of goods,resulting in unreasonable shelf availability and difficulty in profit-making.To solve this problem,we propose a quarterly RFM mining algorithmfor On-shelf products named OS-RFM.Our algorithmmines the high recency,high frequency,and high monetary patterns and considers the period of the on-shelf goods in quarterly units.We conducted experiments using two real datasets for numerical and graphical analysis to prove the algorithm’s effectiveness.Compared with the state-of-the-art RFM mining algorithm,our algorithm can identify more patterns and performs well in terms of precision,recall,and F1-score,with the recall rate nearing 100%.Also,the novel algorithm operates with significantly shorter running times and more stable memory usage than existing mining algorithms.Additionally,we analyze the sales trends of products in different quarters and seasonal variations.The analysis assists businesses in maintaining reasonable on-shelf availability and achieving greater profitability. 展开更多
关键词 Data mining recency pattern high-utility itemset RFM pattern mining on-shelf management
下载PDF
CLS-Miner: efficient and effective closed high-utility itemset mining 被引量:10
16
作者 Thu-Lan DAM Kenli LI +1 位作者 Philippe FOURNIER-VIGER Quang-Huy DUONG 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第2期357-381,共25页
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result,... High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithm outperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly. 展开更多
关键词 UTILITY MINING high-utility itemset MINING CLOSED itemset MINING CLOSED high-utility itemset MINING
原文传递
数据流上的约束跨层级高效用项集挖掘
17
作者 刘淑娟 韩萌 +2 位作者 高智慧 穆栋梁 李昂 《计算机工程与应用》 CSCD 北大核心 2024年第13期287-300,共14页
传统的高效用项集挖掘算法无法发现不同抽象层级类别之间的关系。因此,有研究者提出了跨层级的高效用项集挖掘算法。针对当前跨层级的高效用项集挖掘算法仅能处理静态数据并且无法控制挖掘层级范围的问题,提出了一种动态类别列表结构DTU... 传统的高效用项集挖掘算法无法发现不同抽象层级类别之间的关系。因此,有研究者提出了跨层级的高效用项集挖掘算法。针对当前跨层级的高效用项集挖掘算法仅能处理静态数据并且无法控制挖掘层级范围的问题,提出了一种动态类别列表结构DTUL存储并维护窗口内的项集效用和类别信息。基于此结构,首次提出了基于滑动窗口的约束跨层级高效用项集挖掘算法,包括自下而上挖掘的CCLHM_DTU算法和自上而下挖掘的CCLHM_UTD算法。在含有类别信息的数据集上进行了大量实验,实验结果表明提出的算法能够有效处理数据流并灵活约束项集的层级范围。 展开更多
关键词 高效用项集挖掘 跨层级高效用项集 数据流 滑动窗口 效用列表
下载PDF
Parallel Incremental Frequent Itemset Mining for Large Data 被引量:5
18
作者 Yu-Geng Song Hui-Min Cui Xiao-Bing Feng 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第2期368-385,共18页
Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related sea... Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it. 展开更多
关键词 incremental parallel FPGrowth data mining frequent itemset mining MAPREDUCE
原文传递
HHUIM:一种新的启发式高效用项集挖掘方法
19
作者 高智慧 韩萌 +2 位作者 李昂 刘淑娟 穆栋梁 《计算机应用研究》 CSCD 北大核心 2024年第1期94-101,共8页
针对基于启发式的高效用项集挖掘算法在挖掘过程中可能丢失大量项集的问题,提出一种新的启发式高效用项集挖掘算法HHUIM。HHUIM利用哈里斯鹰优化算法进行种群更新,能够有效减少项集丢失。提出并设计了鹰的替换策略,解决了搜索空间较大... 针对基于启发式的高效用项集挖掘算法在挖掘过程中可能丢失大量项集的问题,提出一种新的启发式高效用项集挖掘算法HHUIM。HHUIM利用哈里斯鹰优化算法进行种群更新,能够有效减少项集丢失。提出并设计了鹰的替换策略,解决了搜索空间较大的问题,降低了适应度函数值低于最小效用阈值的鹰的数量。此外,提出存储回溯策略,可有效防止算法因收敛过快陷入局部最优。大量的实验表明,所提算法优于目前最先进的启发式高效用项集挖掘算法。 展开更多
关键词 哈里斯鹰优化算法 高效用项集挖掘 启发式算法 智能优化算法
下载PDF
高效的一次性弱间隙序列模式挖掘算法
20
作者 杨鸿茜 武优西 +2 位作者 耿萌 刘靖宇 李艳 《计算机工程》 CAS CSCD 北大核心 2024年第3期60-67,共8页
间隙约束序列模式挖掘作为序列模式挖掘的一个重要分支,可以发现模式在序列中的重复出现。然而,当前研究主要针对单项序列进行挖掘,并且序列中每一项都被认为具有相同意义。为解决该问题,提出一次性弱间隙序列模式挖掘(OWP)算法,该算法... 间隙约束序列模式挖掘作为序列模式挖掘的一个重要分支,可以发现模式在序列中的重复出现。然而,当前研究主要针对单项序列进行挖掘,并且序列中每一项都被认为具有相同意义。为解决该问题,提出一次性弱间隙序列模式挖掘(OWP)算法,该算法由准备阶段、支持度计算和候选模式生成3个步骤组成。在准备阶段,建立倒排索引,并对不频繁的项进行剪枝;在支持度计算方面,利用倒排索引结构记录出现位置,避免对原始数据集的重复扫描;在候选模式生成方面,采用模式连接策略,减少冗余候选模式的生成。在项集序列和单项序列共6个真实数据集上的实验结果表明,OWP算法相比OWP-p、Ows-OWP和OWP-e算法在运行时间上分别提升了2.653、1.348、3.592倍,在内存消耗上分别减少了3.51%、0.07%、5%,说明OWP算法可以更高效地挖掘出用户感兴趣的模式。此外,OWP算法在以D1数据集为基础的6倍大小的数据集上的运行时间比D1数据集增长了3.763倍,内存消耗增长了2.310倍,运行时间和内存消耗的增加倍数均小于数据集大小的增加倍数,说明OWP算法具有良好的可扩展性。 展开更多
关键词 序列模式挖掘 项集挖掘 间隙约束 一次性条件 弱间隙约束
下载PDF
上一页 1 2 67 下一页 到第
使用帮助 返回顶部