频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodese...频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodeset两种结构的并行频繁项集挖掘算法(Parallel Mining algorithm of Frequent Itemset based on N-list and DiffNodeset structure,PFIMND)。首先,根据N-list和DiffNodeset在存储不同数据集上的优势,设计了稀疏度估计函数(Sparsity Estimation,SE),根据数据集稀疏程度灵活选取其中之一压缩数据集,相比采用单一存储结构消耗的内存更少;其次,提出了计算量估计函数(Computation Estimation,CE)来估计频繁1项集F-list中每一项的负载量,并根据计算量进行均匀分组;最后采用集合枚举树作为搜索空间,为避免组合爆炸和冗余搜索问题,设计了超集剪枝策略和基于宽度优先搜索的剪枝策略,生成最终的挖掘结果。实验结果表明,相比同类算法HP-FIMBN,PFIMND算法在Susy数据集上挖掘频繁项集的效果提升了12.3%。展开更多
We study CFG parse tree enumeration in this paper. By dividing the set of all parse trees into infinite hierarchies according to height of parse tree, the hierarchical lexicographic order on the set of parse trees is ...We study CFG parse tree enumeration in this paper. By dividing the set of all parse trees into infinite hierarchies according to height of parse tree, the hierarchical lexicographic order on the set of parse trees is established. Then grammar-based algorithms for counting and enumerating CFG parse trees in this order are presented. To generate a parse tree of height n, the time complexity is O(n). If τ is a lowest parse tree for its yield, then O(n) =O(||τ|| + 1), where ||τ|| is the length of the sentence (yield) generated by τ. The sentence can be obtained as a by-product of the parse tree. To compute sentence from its parse tree (needn't be lowest one), the time complexity is O(node)+O(||τ|| + 1), where node is the number of non-leaf nodes of parse tree τ. To generate both a complete lowest parse tree and its yield at the same time, the time complexity is O(||τ|| + 1).展开更多
模型诊断方法是人工智能领域重要的系统故障自动检测方法,被广泛应用于软件故障检测和硬件诊断.近年来由于电路规模和复杂度不断增大,其诊断难度也不断增大.本文通过对电路模型特征的研究,结合LLBRStree(Last-Level Based on Reverse Se...模型诊断方法是人工智能领域重要的系统故障自动检测方法,被广泛应用于软件故障检测和硬件诊断.近年来由于电路规模和复杂度不断增大,其诊断难度也不断增大.本文通过对电路模型特征的研究,结合LLBRStree(Last-Level Based on Reverse Search-tree)诊断算法提出分组式诊断方法 GD(Grouped Diagnosis):首先结合电路特征确定组件的故障相关性并对电路组件进行分组,可缩减电路中需检测的规模;其次,利用分组后电路并结合非诊断解定理和SAT(SATisfiability)求解特征定位部分非诊断解,从而避免该部分的一致性检测来加速求解.本文算法可应用于电子电路故障诊断领域,并且实验结果表明该算法与LLBRS-tree算法相比求解效率平均提高了1.5倍,最多提高了3倍.展开更多
文摘频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodeset两种结构的并行频繁项集挖掘算法(Parallel Mining algorithm of Frequent Itemset based on N-list and DiffNodeset structure,PFIMND)。首先,根据N-list和DiffNodeset在存储不同数据集上的优势,设计了稀疏度估计函数(Sparsity Estimation,SE),根据数据集稀疏程度灵活选取其中之一压缩数据集,相比采用单一存储结构消耗的内存更少;其次,提出了计算量估计函数(Computation Estimation,CE)来估计频繁1项集F-list中每一项的负载量,并根据计算量进行均匀分组;最后采用集合枚举树作为搜索空间,为避免组合爆炸和冗余搜索问题,设计了超集剪枝策略和基于宽度优先搜索的剪枝策略,生成最终的挖掘结果。实验结果表明,相比同类算法HP-FIMBN,PFIMND算法在Susy数据集上挖掘频繁项集的效果提升了12.3%。
基金Supported by the National Natural Science Foundation of China (Grant Nos. 60273023, 60721061)
文摘We study CFG parse tree enumeration in this paper. By dividing the set of all parse trees into infinite hierarchies according to height of parse tree, the hierarchical lexicographic order on the set of parse trees is established. Then grammar-based algorithms for counting and enumerating CFG parse trees in this order are presented. To generate a parse tree of height n, the time complexity is O(n). If τ is a lowest parse tree for its yield, then O(n) =O(||τ|| + 1), where ||τ|| is the length of the sentence (yield) generated by τ. The sentence can be obtained as a by-product of the parse tree. To compute sentence from its parse tree (needn't be lowest one), the time complexity is O(node)+O(||τ|| + 1), where node is the number of non-leaf nodes of parse tree τ. To generate both a complete lowest parse tree and its yield at the same time, the time complexity is O(||τ|| + 1).
文摘模型诊断方法是人工智能领域重要的系统故障自动检测方法,被广泛应用于软件故障检测和硬件诊断.近年来由于电路规模和复杂度不断增大,其诊断难度也不断增大.本文通过对电路模型特征的研究,结合LLBRStree(Last-Level Based on Reverse Search-tree)诊断算法提出分组式诊断方法 GD(Grouped Diagnosis):首先结合电路特征确定组件的故障相关性并对电路组件进行分组,可缩减电路中需检测的规模;其次,利用分组后电路并结合非诊断解定理和SAT(SATisfiability)求解特征定位部分非诊断解,从而避免该部分的一致性检测来加速求解.本文算法可应用于电子电路故障诊断领域,并且实验结果表明该算法与LLBRS-tree算法相比求解效率平均提高了1.5倍,最多提高了3倍.