Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) i...The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain a...Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in l...HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.展开更多
The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table techni...The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.展开更多
Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider neg...Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete th...The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete the redundant data. It can avoid scanning the database repeatedly and producing a large number of false rules. Secondly, the paper used clustering results to perform association rule mining. It can obtain valuable information and achieve the service of intelligent recommendation.展开更多
Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of th...Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.展开更多
In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships b...In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.展开更多
In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers...In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.展开更多
One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques t...One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.展开更多
Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyz...Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets and then forming conditional implication rules among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
文摘The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
文摘Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
文摘HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.
文摘The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.
基金Supported by the National Natural Science Foun-dation of China(70371015) and the Science Foundation of JiangsuUniversity ( 04KJD001)
文摘Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
文摘The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete the redundant data. It can avoid scanning the database repeatedly and producing a large number of false rules. Secondly, the paper used clustering results to perform association rule mining. It can obtain valuable information and achieve the service of intelligent recommendation.
文摘Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.
基金The work is supported by Natural Science Foundatiion of Chongqing (No .CSTC 2005BB2065)
文摘In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.
基金Supported in part by the National Natural Science Foundation of China(No.60073012),Natural Science Foundation of Jiangsu(BK2001004)
文摘In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.
基金support from Taif University Researchers supporting Project Number(TURSP-2020/215),Taif University,Taif,Saudi Arabia.
文摘One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.
基金theNationalNaturalScienceFoundationofChina (No .496 780 49)
文摘Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets and then forming conditional implication rules among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.