Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model ...Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.展开更多
In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of da...In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of data informa-tion resources.Intrusion identification system can easily detect the false positive alerts.If large number of false positive alerts are created then it makes intrusion detection system as difficult to differentiate the false positive alerts from genuine attacks.Many research works have been done.The issues in the existing algo-rithms are more memory space and need more time to execute the transactions of records.This paper proposes a novel framework of network security Intrusion Detection System(IDS)using Modified Frequent Pattern(MFP-Tree)via K-means algorithm.The accuracy rate of Modified Frequent Pattern Tree(MFPT)-K means method infinding the various attacks are Normal 94.89%,for DoS based attack 98.34%,for User to Root(U2R)attacks got 96.73%,Remote to Local(R2L)got 95.89%and Probe attack got 92.67%and is optimal when it is compared with other existing algorithms of K-Means and APRIORI.展开更多
A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking in...A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.展开更多
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题,之前的很多研究都是采用Apriori类的候选项目集生成-检验方法.然而,候选项目集产生的代价是很高的,尤其是在存在大量强模式和/或长模式的时候.提出了一种快速的基于频繁模式树(FP-tr...挖掘最大频繁项目集是多种数据挖掘应用中的关键问题,之前的很多研究都是采用Apriori类的候选项目集生成-检验方法.然而,候选项目集产生的代价是很高的,尤其是在存在大量强模式和/或长模式的时候.提出了一种快速的基于频繁模式树(FP-tree)的最大频繁项目集挖掘DMFIA(discover maximum frequent itemsets algorithm)及其更新算法UMFIA(update maximum frequent itemsets algorithm).算法UMFIA将充分利用以前的挖掘结果来减少在更新的数据库中发现新的最大频繁项目集的费用.展开更多
为了解决最大频繁项目集算法DMFIA(discover maximum frequent itemsets algorithm)在挖掘候选项目集维数较大而最大频繁项目集维数较小的情况下产生大量候选项目集的问题,提出一种改进的基于FP-Tree(frequent pattern tree)的最大频繁...为了解决最大频繁项目集算法DMFIA(discover maximum frequent itemsets algorithm)在挖掘候选项目集维数较大而最大频繁项目集维数较小的情况下产生大量候选项目集的问题,提出一种改进的基于FP-Tree(frequent pattern tree)的最大频繁项目集挖掘的FP-EMFIA算法;该算法在挖掘过程中根据项目头表,采用自上而下和自下而上的双向搜索策略,并通过条件模式基中的频繁项目和较小维数的非频繁项目集对候选项目集进行降维和剪枝,以减少候选项目集的数量,加速对候选集计数的操作。在经典数据集mushroom、chess和connect上的实验结果表明,FP-EMFIA算法在支持度较小时的时间效率优于DMFIA、IDMFIA(improved algorithm of DMFIA)和BDRFI(algorithm for mining frequent itemsets based on decreasing dimensionality reduction of frequent itemsets)算法的,说明FP-EMFIA算法在候选项目集维数较大时有相对优势。展开更多
基金Supported by the National Natural Science Foundation of China ( No.60474022)Henan Innovation Project for University Prominent Research Talents (No.2007KYCX018)
文摘Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.
文摘In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of data informa-tion resources.Intrusion identification system can easily detect the false positive alerts.If large number of false positive alerts are created then it makes intrusion detection system as difficult to differentiate the false positive alerts from genuine attacks.Many research works have been done.The issues in the existing algo-rithms are more memory space and need more time to execute the transactions of records.This paper proposes a novel framework of network security Intrusion Detection System(IDS)using Modified Frequent Pattern(MFP-Tree)via K-means algorithm.The accuracy rate of Modified Frequent Pattern Tree(MFPT)-K means method infinding the various attacks are Normal 94.89%,for DoS based attack 98.34%,for User to Root(U2R)attacks got 96.73%,Remote to Local(R2L)got 95.89%and Probe attack got 92.67%and is optimal when it is compared with other existing algorithms of K-Means and APRIORI.
文摘A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.
文摘选择性集成通过选择部分基分类器参与集成,从而提高集成分类器的泛化能力,降低预测开销.但已有的选择性集成算法普遍耗时较长,将数据挖掘的技术应用于选择性集成,提出一种基于FP-Tree(frequent pattern tree)的快速选择性集成算法:CPM-EP(coverage based pattern mining for ensemble pruning).该算法将基分类器对校验样本集的分类结果组织成一个事务数据库,从而使选择性集成问题可转化为对事务数据集的处理问题.针对所有可能的集成分类器大小,CPM-EP算法首先得到一个精简的事务数据库,并创建一棵FP-Tree树保存其内容;然后,基于该FP-Tree获得相应大小的集成分类器.在获得的所有集成分类器中,对校验样本集预测精度最高的集成分类器即为算法的输出.实验结果表明,CPM-EP算法以很低的计算开销获得优越的泛化能力,其分类器选择时间约为GASEN的1/19以及Forward-Selection的1/8,其泛化能力显著优于参与比较的其他方法,而且产生的集成分类器具有较少的基分类器.
文摘挖掘最大频繁项目集是多种数据挖掘应用中的关键问题,之前的很多研究都是采用Apriori类的候选项目集生成-检验方法.然而,候选项目集产生的代价是很高的,尤其是在存在大量强模式和/或长模式的时候.提出了一种快速的基于频繁模式树(FP-tree)的最大频繁项目集挖掘DMFIA(discover maximum frequent itemsets algorithm)及其更新算法UMFIA(update maximum frequent itemsets algorithm).算法UMFIA将充分利用以前的挖掘结果来减少在更新的数据库中发现新的最大频繁项目集的费用.
文摘为了解决最大频繁项目集算法DMFIA(discover maximum frequent itemsets algorithm)在挖掘候选项目集维数较大而最大频繁项目集维数较小的情况下产生大量候选项目集的问题,提出一种改进的基于FP-Tree(frequent pattern tree)的最大频繁项目集挖掘的FP-EMFIA算法;该算法在挖掘过程中根据项目头表,采用自上而下和自下而上的双向搜索策略,并通过条件模式基中的频繁项目和较小维数的非频繁项目集对候选项目集进行降维和剪枝,以减少候选项目集的数量,加速对候选集计数的操作。在经典数据集mushroom、chess和connect上的实验结果表明,FP-EMFIA算法在支持度较小时的时间效率优于DMFIA、IDMFIA(improved algorithm of DMFIA)和BDRFI(algorithm for mining frequent itemsets based on decreasing dimensionality reduction of frequent itemsets)算法的,说明FP-EMFIA算法在候选项目集维数较大时有相对优势。