Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only f...Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.展开更多
BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available bi...BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.展开更多
Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic rela...An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.展开更多
In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not cons...In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.展开更多
Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds...Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds of classification rules in the application,two fuzzy classifiers were established by combining with fuzzy decision algorithm especially based on Second General Soil Survey of Guangdong Province.The results of experiments demonstrated that the fuzzy classifier based on association rules obtain a higher accuracy rate,but with more complex calculation process and more computational overhead;the fuzzy classifier based on C4.5 rules obtain a slightly lower accuracy,but with fast computation and simpler calculation.展开更多
Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weight...Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weighted association rules are used in this paper to mine intrustion models, which can increase the detection rate and decrease the false positive rate by some extent. Based on this, the structure of host-based IDS using weighted association rules is proposed.展开更多
The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) i...The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.展开更多
In order to discover the main causes of elevator group accidents in edge computing environment, a multi-dimensional data model of elevator accident data is established by using data cube technology, proposing and impl...In order to discover the main causes of elevator group accidents in edge computing environment, a multi-dimensional data model of elevator accident data is established by using data cube technology, proposing and implementing a method by combining classical Apriori algorithm with the model, digging out frequent items of elevator accident data to explore the main reasons for the occurrence of elevator accidents. In addition, a collaborative edge model of elevator accidents is set to achieve data sharing, making it possible to check the detail of each cause to confirm the causes of elevator accidents. Lastly the association rules are applied to find the law of elevator Accidents.展开更多
OBJECTIVE To identify compound combinations as candidate multi-component drugs for the type 2 diabetes from natural product information.METHODS Chemical composition information of herbs in natural medicine was acquire...OBJECTIVE To identify compound combinations as candidate multi-component drugs for the type 2 diabetes from natural product information.METHODS Chemical composition information of herbs in natural medicine was acquired by integrating conventional databases;Traditional Chinese Medicine Information Database(TCM-ID)and Traditional Chinese Medicine Integrated Database(TCMID).Therapeutic effect of each herb on the type 2 diabetes was examined by analyzing annotated function information with a text-mining method.The Apriori algorithm,which is a classical method for extracting associations between object in large-scale databases,was employed to infer association rules between compound combinations and therapeutic effect on the target disease.The chemical composition and therapeutic information of each herb was used as a transaction,which consists of the chemical compound combination as an antecedent item set and the therapeutic effect as a consequent item.The association rules with high support and confidence value were suggested as candidate multi-component drugs for the type 2 diabetes.RESULTS Totally 40 941 association rules were inferred with support lower bound 0.05% and maximum rule length 4.With respect to support and confidence,the top-ranked compound combination was puerarin and daidzin(support=0.15%,confidence=100%).In addition,the top 16 compound combinations were composed of 11 individual chemical compounds;puerarin,daidzin,abscisic acid,batatisine,dopamine,cholesterol,daidzein,gamma-aminobutyric acid,stigmasterol,campesteryl ferulate,and campesterol.To validate therapeutic effect of the proposed compound combinations,literature evidences of each individual compound were investigated.Among the 11 individual compounds,six compounds were reported to be effective for the treatment of the diabetes mellitus.CONCLUSION By analyzing natural product in formation with association rule mining,16 compound combinations are suggested as candidate multi-component drugs for the type 2 diabetes.These compound combinations are recommended for further investigation in the context of drug development.展开更多
Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of informatio...Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of information visualization to enable users to gain an understanding of high dimensional and large-scale data can play a major role in the exploration, identification, and interpretation of association rules. In this paper, we propose a method that provides multiple views of the association rules, linked together through a filtering mechanism. A visual inspection of the entire association rule set is enabled within a matrix view. Items of interest can be selected, resulting in their corresponding association rules being shown in a graph view. At any time, individual rules can be selected in either view, resulting in their information being shown in the detail view. The fundamental premise in this work is that by providing such a visual and interactive representation of the association rules, users will be able to find important rules quickly and easily, even as the number of rules that must be inspected becomes large. A user evaluation was conducted which validates this premise.展开更多
The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, th...The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.展开更多
Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence w...Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.展开更多
Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyz...Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets and then forming conditional implication rules among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
Customer requirements analysis is the key step for product variety design of mass customiza-tion(MC). Quality function deployment (QFD) is a widely used management technique for understanding the voice of the customer...Customer requirements analysis is the key step for product variety design of mass customiza-tion(MC). Quality function deployment (QFD) is a widely used management technique for understanding the voice of the customer (VOC), however, QFD depends heavily on human subject judgment during extracting customer requirements and determination of the importance weights of customer requirements. QFD pro-cess and related problems are so complicated that it is not easily used. In this paper, based on a general data structure of product family, generic bill of material (GBOM), association rules analysis was introduced to construct the classification mechanism between customer requirements and product architecture. The new method can map customer requirements to the items of product family architecture respectively, accomplish the mapping process from customer domain to physical domain directly, and decrease mutual process between customer and designer, improve the product design quality, and thus furthest satisfy customer needs. Finally, an example of customer requirements mapping of the elevator cabin was used to illustrate the proposed method.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations i...Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations is great use of better identifying trends in associations and forecast-ing.Because cyclic rules are quite sensitive to a littlenoise,this paper uses the noise-ratio as the criterion of i-dentifing cydclic itemsets for dealing with the problem and utilizes the cycle-pruning technique to reduce the comput-ing time of the data mining process by exploiting the real-tionship between the cycle and generalized frequent item-sets.The paper gives the algorithm of mining cyclic gen-eralized itemsets(CGI).Experiment shows that the CGI algorithm can efficiently yield results.展开更多
This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negativ...This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negative Association Rule Mining algorithm(CBPNARM).CBPNARM was developed to extract positive and negative association rules from Spatiotemporal(space-time)data only,while the proposed algorithm can be applied to both spatial and non-spatial data.The proposed algorithm is applied to the energy dataset to classify a country’s energy development by uncovering the enthralling interdependencies between the set of variables to get positive and negative associations.Many association rules related to sustainable energy development are extracted by the proposed algorithm that needs to be pruned by some pruning technique.The context,in this paper serves as a pruning measure to extract pertinent association rules from non-spatial data.Conditional Probability Increment Ratio(CPIR)is also added in the proposed algorithm that was not used in CBPNARM.The inclusion of the context variable and CPIR resulted in fewer rules and improved robustness and ease of use.Also,the extraction of a common negative frequent itemset in CARM is different from that of CBPNARM.The rules created by the proposed algorithm are more meaningful,significant,relevant and insightful.The accuracy of the proposed algorithm is compared with the Apriori,PNARM and CBPNARM algorithms.The results demonstrated enhanced accuracy,relevance and timeliness.展开更多
文摘Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.
文摘BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Science and Technology Fund of China University of Mining and Technology(No.2007B016)
文摘An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.
文摘In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.
基金Supported by Science and Technology Plan Project of Guangdong Province (2009B010900026,2009CD058,2009CD078,2009CD079,2009CD080)Special Funds for Support Program of Development of Modern Information Service Industry of Guangdong Province(06120840B0370124)Funded Fund Project of South China Agricultural University (2007K017)~~
文摘Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds of classification rules in the application,two fuzzy classifiers were established by combining with fuzzy decision algorithm especially based on Second General Soil Survey of Guangdong Province.The results of experiments demonstrated that the fuzzy classifier based on association rules obtain a higher accuracy rate,but with more complex calculation process and more computational overhead;the fuzzy classifier based on C4.5 rules obtain a slightly lower accuracy,but with fast computation and simpler calculation.
文摘Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weighted association rules are used in this paper to mine intrustion models, which can increase the detection rate and decrease the false positive rate by some extent. Based on this, the structure of host-based IDS using weighted association rules is proposed.
文摘The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.
文摘In order to discover the main causes of elevator group accidents in edge computing environment, a multi-dimensional data model of elevator accident data is established by using data cube technology, proposing and implementing a method by combining classical Apriori algorithm with the model, digging out frequent items of elevator accident data to explore the main reasons for the occurrence of elevator accidents. In addition, a collaborative edge model of elevator accidents is set to achieve data sharing, making it possible to check the detail of each cause to confirm the causes of elevator accidents. Lastly the association rules are applied to find the law of elevator Accidents.
基金The project supported by the Bio-Synergy Research Project(NRF-2012M3A9C4048758)of the Ministry of Science,ICT and Future Planning through the National Research Foundation
文摘OBJECTIVE To identify compound combinations as candidate multi-component drugs for the type 2 diabetes from natural product information.METHODS Chemical composition information of herbs in natural medicine was acquired by integrating conventional databases;Traditional Chinese Medicine Information Database(TCM-ID)and Traditional Chinese Medicine Integrated Database(TCMID).Therapeutic effect of each herb on the type 2 diabetes was examined by analyzing annotated function information with a text-mining method.The Apriori algorithm,which is a classical method for extracting associations between object in large-scale databases,was employed to infer association rules between compound combinations and therapeutic effect on the target disease.The chemical composition and therapeutic information of each herb was used as a transaction,which consists of the chemical compound combination as an antecedent item set and the therapeutic effect as a consequent item.The association rules with high support and confidence value were suggested as candidate multi-component drugs for the type 2 diabetes.RESULTS Totally 40 941 association rules were inferred with support lower bound 0.05% and maximum rule length 4.With respect to support and confidence,the top-ranked compound combination was puerarin and daidzin(support=0.15%,confidence=100%).In addition,the top 16 compound combinations were composed of 11 individual chemical compounds;puerarin,daidzin,abscisic acid,batatisine,dopamine,cholesterol,daidzein,gamma-aminobutyric acid,stigmasterol,campesteryl ferulate,and campesterol.To validate therapeutic effect of the proposed compound combinations,literature evidences of each individual compound were investigated.Among the 11 individual compounds,six compounds were reported to be effective for the treatment of the diabetes mellitus.CONCLUSION By analyzing natural product in formation with association rule mining,16 compound combinations are suggested as candidate multi-component drugs for the type 2 diabetes.These compound combinations are recommended for further investigation in the context of drug development.
文摘Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of information visualization to enable users to gain an understanding of high dimensional and large-scale data can play a major role in the exploration, identification, and interpretation of association rules. In this paper, we propose a method that provides multiple views of the association rules, linked together through a filtering mechanism. A visual inspection of the entire association rule set is enabled within a matrix view. Items of interest can be selected, resulting in their corresponding association rules being shown in a graph view. At any time, individual rules can be selected in either view, resulting in their information being shown in the detail view. The fundamental premise in this work is that by providing such a visual and interactive representation of the association rules, users will be able to find important rules quickly and easily, even as the number of rules that must be inspected becomes large. A user evaluation was conducted which validates this premise.
基金supported by the National Natural Science Foundation of China (No. J07240003, No. 60773084, No. 60603023)National Research Fund for the Doctoral Program of Higher Education of China (No. 20070151009)
文摘The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.
基金Projects(10871031, 60474070) supported by the National Natural Science Foundation of ChinaProject(07A001) supported by the Scientific Research Fund of Hunan Provincial Education Department, China
文摘Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.
基金theNationalNaturalScienceFoundationofChina (No .496 780 49)
文摘Data mining, i.e., mining knowledge from large amounts of data, is a demanding field since huge amounts of data have been collected in various applications. The collected data far exceed people's ability to analyze it. Thus, some new and efficient methods are needed to discover knowledge from large database. Association rule discovery is an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets and then forming conditional implication rules among them. In this paper, we describe and summarize recent work on association rule discovery, offer a new method to association rule mining and point out that association rule discovery can be applied in spatial data mining. It is useful to discover knowledge from remote sensing and geographical information system.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
基金the National Natural Science Founda-tion of China (No. 70471022)the NSFC / Hong KongResearch Grant Council (No. 70418013)
文摘Customer requirements analysis is the key step for product variety design of mass customiza-tion(MC). Quality function deployment (QFD) is a widely used management technique for understanding the voice of the customer (VOC), however, QFD depends heavily on human subject judgment during extracting customer requirements and determination of the importance weights of customer requirements. QFD pro-cess and related problems are so complicated that it is not easily used. In this paper, based on a general data structure of product family, generic bill of material (GBOM), association rules analysis was introduced to construct the classification mechanism between customer requirements and product architecture. The new method can map customer requirements to the items of product family architecture respectively, accomplish the mapping process from customer domain to physical domain directly, and decrease mutual process between customer and designer, improve the product design quality, and thus furthest satisfy customer needs. Finally, an example of customer requirements mapping of the elevator cabin was used to illustrate the proposed method.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
文摘Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations is great use of better identifying trends in associations and forecast-ing.Because cyclic rules are quite sensitive to a littlenoise,this paper uses the noise-ratio as the criterion of i-dentifing cydclic itemsets for dealing with the problem and utilizes the cycle-pruning technique to reduce the comput-ing time of the data mining process by exploiting the real-tionship between the cycle and generalized frequent item-sets.The paper gives the algorithm of mining cyclic gen-eralized itemsets(CGI).Experiment shows that the CGI algorithm can efficiently yield results.
文摘This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negative Association Rule Mining algorithm(CBPNARM).CBPNARM was developed to extract positive and negative association rules from Spatiotemporal(space-time)data only,while the proposed algorithm can be applied to both spatial and non-spatial data.The proposed algorithm is applied to the energy dataset to classify a country’s energy development by uncovering the enthralling interdependencies between the set of variables to get positive and negative associations.Many association rules related to sustainable energy development are extracted by the proposed algorithm that needs to be pruned by some pruning technique.The context,in this paper serves as a pruning measure to extract pertinent association rules from non-spatial data.Conditional Probability Increment Ratio(CPIR)is also added in the proposed algorithm that was not used in CBPNARM.The inclusion of the context variable and CPIR resulted in fewer rules and improved robustness and ease of use.Also,the extraction of a common negative frequent itemset in CARM is different from that of CBPNARM.The rules created by the proposed algorithm are more meaningful,significant,relevant and insightful.The accuracy of the proposed algorithm is compared with the Apriori,PNARM and CBPNARM algorithms.The results demonstrated enhanced accuracy,relevance and timeliness.