The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of r...The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of researchactivities around this problem. Traditional association rule mining is limited tointratransaction. Only recently the concept of N-dimensional inter-transaction association rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extendsLu's definition of NDITAR based on the analysis of its limitations, and the generalized multidimensional association rule (GMDAR) is subsequently introduced, whichis more general, flexible and reasonable than NDITAR.展开更多
In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally ...In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.展开更多
In order to make effective use a large amount of graduate data in colleges and universities that accumulate by teaching management of work, the paper study the data mining for higher vocational graduates database usin...In order to make effective use a large amount of graduate data in colleges and universities that accumulate by teaching management of work, the paper study the data mining for higher vocational graduates database using the data mining technology. Using a variety of data preprocessing methods for the original data, and the paper put forward to mining algorithm based on commonly association rule Apriori algorithm, then according to the actual needs of the design and implementation of association rule mining system, has been beneficial to the employment guidance of college teaching management decision and graduates of the mining results.展开更多
HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of da...HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.展开更多
Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are intro...Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.展开更多
关联规则分析作为数据挖掘的主要手段之一,在发现海量事务数据中隐含的有价值信息方面具有重要的作用。该文针对Apriori算法的固有缺陷,提出了AWP(Apriori With Prejudging)算法。该算法在Apriori算法连接、剪枝的基础上,添加了预判筛...关联规则分析作为数据挖掘的主要手段之一,在发现海量事务数据中隐含的有价值信息方面具有重要的作用。该文针对Apriori算法的固有缺陷,提出了AWP(Apriori With Prejudging)算法。该算法在Apriori算法连接、剪枝的基础上,添加了预判筛选的步骤,使用先验概率对候选频繁k项集集合进行缩减优化,并且引入阻尼因子和补偿因子对预判筛选产生的误差进行修正,简化了挖掘频繁项集的操作过程。实验证明AWP算法能够有效减少扫描数据库的次数,降低算法的运行时间。展开更多
文摘The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of researchactivities around this problem. Traditional association rule mining is limited tointratransaction. Only recently the concept of N-dimensional inter-transaction association rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extendsLu's definition of NDITAR based on the analysis of its limitations, and the generalized multidimensional association rule (GMDAR) is subsequently introduced, whichis more general, flexible and reasonable than NDITAR.
基金This work was supported in part by the National '863' High-Tech Programme of China !(No.863-306-ZD06-2)
文摘In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.
文摘In order to make effective use a large amount of graduate data in colleges and universities that accumulate by teaching management of work, the paper study the data mining for higher vocational graduates database using the data mining technology. Using a variety of data preprocessing methods for the original data, and the paper put forward to mining algorithm based on commonly association rule Apriori algorithm, then according to the actual needs of the design and implementation of association rule mining system, has been beneficial to the employment guidance of college teaching management decision and graduates of the mining results.
文摘HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.
文摘Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.
文摘关联规则分析作为数据挖掘的主要手段之一,在发现海量事务数据中隐含的有价值信息方面具有重要的作用。该文针对Apriori算法的固有缺陷,提出了AWP(Apriori With Prejudging)算法。该算法在Apriori算法连接、剪枝的基础上,添加了预判筛选的步骤,使用先验概率对候选频繁k项集集合进行缩减优化,并且引入阻尼因子和补偿因子对预判筛选产生的误差进行修正,简化了挖掘频繁项集的操作过程。实验证明AWP算法能够有效减少扫描数据库的次数,降低算法的运行时间。