Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only f...Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.展开更多
BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available bi...BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.展开更多
An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic rela...An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds...Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds of classification rules in the application,two fuzzy classifiers were established by combining with fuzzy decision algorithm especially based on Second General Soil Survey of Guangdong Province.The results of experiments demonstrated that the fuzzy classifier based on association rules obtain a higher accuracy rate,but with more complex calculation process and more computational overhead;the fuzzy classifier based on C4.5 rules obtain a slightly lower accuracy,but with fast computation and simpler calculation.展开更多
Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weight...Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weighted association rules are used in this paper to mine intrustion models, which can increase the detection rate and decrease the false positive rate by some extent. Based on this, the structure of host-based IDS using weighted association rules is proposed.展开更多
The advent of the big data era has provided many types of transportation datasets,such as metro smart card data,for studying residents’mobility and understanding how their mobility has been shaped and is shaping the ...The advent of the big data era has provided many types of transportation datasets,such as metro smart card data,for studying residents’mobility and understanding how their mobility has been shaped and is shaping the urban space.In this paper,we use metro smart card data from two Chinese metropolises,Shanghai and Shenzhen.Five metro mobility indicators are introduced,and association rules are established to explore the mobility patterns.The proportion of people entering and exiting the station is used to measure the jobs-housing balance.It is found that the average travel distance and duration of Shanghai passengers are higher than those of Shenzhen,and the proportion of metro commuters in Shanghai is higher than that of Shenzhen.The jobs-housing spatial relationship in Shenzhen based on metro travel is more balanced than that in Shanghai.The fundamental reason for the differences between the two cities is the difference in urban morphology.Compared with the monocentric structure of Shanghai,the polycentric structure of Shenzhen results in more scattered travel hotspots and more diverse travel routes,which helps Shenzhen to have a better jobs-housing balance.This paper fills a gap in comparative research among Chinese cities based on transportation big data analysis.The results provide support for planning metro routes,adjusting urban structure and land use to form a more reasonable metro network,and balancing the jobs-housing spatial relationship.展开更多
Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current ...Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current practices of evidencebased medicine,the laboratory tests analysing disease patterns through the association rule mining(ARM)have emerged as a modern tool for the risk assessment and the disease stratification,with the potential to reduce cardiovascular disease(CVD)mortality.CVDs are the well recognised leading global cause of mortality with the higher fatality rates in the Indian population due to associated factors like hypertension,diabetes,and lifestyle choices.AI-driven algorithms have offered deep insights in this field while addressing various challenges such as healthcare systems grappling with the physician shortages.Personalized medicine,well driven by the big data necessitates the integration of ML techniques and high-quality electronic health records to direct the meaningful outcome.These technological advancements enhance the computational analyses for both research and clinical practice.ARM plays a pivotal role by uncovering meaningful relationships within databases,aiding in patient survival prediction and risk factor identification.AI potential in laboratory medicine is vast and it must be cautiously integrated while considering potential ethical,legal,and privacy concerns.Thus,an AI ethics framework is essential to guide its responsible use.Aligning AI algorithms with existing lab practices,promoting education among healthcare professionals,and fostering careful integration into clinical settings are imperative for harnessing the benefits of this transformative technology.展开更多
Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence w...Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
Objective:Based on data mining software,applying frequent itemsets,association rules,hierarchical clustering,complex networks and other data mining methods to analyze the usage and compatibility of traditional Chinese...Objective:Based on data mining software,applying frequent itemsets,association rules,hierarchical clustering,complex networks and other data mining methods to analyze the usage and compatibility of traditional Chinese medicine(TCM)patent compound for functional dyspepsia.Method:Use the Chinese patent database to search the compound for the treatment of functional dyspepsia,exclude traditional Chinese medicine extracts,single drugs,combined use of Chinese and Western medicines,etc.,screen the patented compound of TCM,establish an Excel data table,and apply data mining software to The data is subjected to frequency statistics,association rules,cluster analysis and complex network analysis.Result:A total of 238 prescriptions for functional dyspepsia were screened.The four qi of the drugs were mainly warm and calm,the five flavors were mainly sweet and spicy,and the spleen and stomach were the main meridians.The top 10 Chinese medicines with higher frequency are Shanzha、Chenpi、Gancao、Maiya、Jineijin、Fuling、Baizhu、Shenqu、Houpo、Banxia;frequent itemsets show that the drugs are mainly compatible with qi and spleen,qi and digestion;association rules The analysis shows that the common drug pairs used in the treatment of functional dyspepsia include Chenpi-Shanzha、Maiya-Shanzha、Jineijin-Shanzha,etc.;cluster analysis found that there are 4 types of drugs for functional dyspepsia,mainly including drugs for regulating qi-flowing for harmonizing stomach,drugs for soothing liver and promoting Qi,drugs for eliminating food and resolving accumulation,drugs for benefiting qi and strengthening spleen;the 22-flavor Chinese medicine in the core drug network,the core compatibility is mainly to eliminate stagnation and spleen.Conclusion:Data mining research provides a reference for the clinical treatment of functional dyspepsia and the development of TCM formulas;Clinical treatment of functional dyspepsia should grasp the basic principles of strengthening vital energy and eliminating pathogenic factors to benefit qi,strengthen the spleen,and eliminate food.It is a basic treatment method,taking into account the methods of regulating qi-flowing for harmonizing stomach,soothing the liver and relieving depression,relieving dampness and dampness,and combining the specific conditions of patients with syndrome differentiation and treatment.展开更多
The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, th...The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.展开更多
At present, associated flow rule of traditional plastic theory is adopted in the slip line field theory and upper bound method of geotechnical materials. So the stress characteristic line conforms to the velocity line...At present, associated flow rule of traditional plastic theory is adopted in the slip line field theory and upper bound method of geotechnical materials. So the stress characteristic line conforms to the velocity line. It is proved that geotechnical materials do not abide by the associated flow rule. It is impossible for the stress characteristic line to conform to the velocity line. Generalized plastic mechanics theoretically proved that plastic potential surface intersects the Mohr-Coulomb yield surface with an angle, so that the velocity line must be studied by non-associated flow rule. According to limit analysis theory, the theory of slip line field is put forward in this paper, and then the ultimate beating capacity of strip footing is obtained based on the associated flow rule and the non-associated flow nile individually. These two results are identical since the ultimate bearing capacity is independent of flow role. On the contrary, the velocity fields of associated and non-associated flow rules are different which shows the velocity field based on the associat- ed flow rule is incorrect.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
Extracting objects from legacy systems is a basic step in system's object orientation to improve the maintainability and understandability of the systems. A new object extraction model using association rules and...Extracting objects from legacy systems is a basic step in system's object orientation to improve the maintainability and understandability of the systems. A new object extraction model using association rules and dependence analysis is proposed. In this model data are classified by association rules and the corresponding operations are partitioned by dependence analysis.展开更多
Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while ...Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.展开更多
At present, most of the association rules algorithms are based on the Boolean attribute and single-level association rules mining. But data of the real world has various types, the multi-level and quantitative attribu...At present, most of the association rules algorithms are based on the Boolean attribute and single-level association rules mining. But data of the real world has various types, the multi-level and quantitative attributes are got more and more attention. And the most important step is to mine frequent sets. In this paper, we propose an algorithm that is called fuzzy multiple-level association (FMA) rules to mine frequent sets. It is based on the improved Eclat algorithm that is different to many researchers’ proposed algorithms thatused the Apriori algorithm. We analyze quantitative data’s frequent sets by using the fuzzy theory, dividing the hierarchy of concept and softening the boundary of attributes’ values and frequency. In this paper, we use the vertical-style data and the improved Eclat algorithm to describe the proposed method, we use this algorithm to analyze the data of Beijing logistics route. Experiments show that the algorithm has a good performance, it has better effectiveness and high efficiency.展开更多
Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider neg...Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.展开更多
We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algor...We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algorithm of association rules and apply to detect anomaly in IDS. Moreover, according to the characteristic of intrusion detection, we optimize the mining algorithm of association rules, and use fuzzy logic to improve the system performance.展开更多
文摘Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.
文摘BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Science and Technology Fund of China University of Mining and Technology(No.2007B016)
文摘An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
基金Supported by Science and Technology Plan Project of Guangdong Province (2009B010900026,2009CD058,2009CD078,2009CD079,2009CD080)Special Funds for Support Program of Development of Modern Information Service Industry of Guangdong Province(06120840B0370124)Funded Fund Project of South China Agricultural University (2007K017)~~
文摘Association rules and C4.5 rules can overcome the shortage of the traditional land evaluation methods and improve the intelligibility and efficiency of the land evaluation knowledge.In order to compare these two kinds of classification rules in the application,two fuzzy classifiers were established by combining with fuzzy decision algorithm especially based on Second General Soil Survey of Guangdong Province.The results of experiments demonstrated that the fuzzy classifier based on association rules obtain a higher accuracy rate,but with more complex calculation process and more computational overhead;the fuzzy classifier based on C4.5 rules obtain a slightly lower accuracy,but with fast computation and simpler calculation.
文摘Association rules are useful for determining correlations between items. Applying association rules to intrusion detection system (IDS) can improve the detection rate, but false positive rate is also increased. Weighted association rules are used in this paper to mine intrustion models, which can increase the detection rate and decrease the false positive rate by some extent. Based on this, the structure of host-based IDS using weighted association rules is proposed.
基金National Key R&D Program of China(No.2019YFB2103102)Hong Kong Polytechnic University(No.CD06,P0042540)。
文摘The advent of the big data era has provided many types of transportation datasets,such as metro smart card data,for studying residents’mobility and understanding how their mobility has been shaped and is shaping the urban space.In this paper,we use metro smart card data from two Chinese metropolises,Shanghai and Shenzhen.Five metro mobility indicators are introduced,and association rules are established to explore the mobility patterns.The proportion of people entering and exiting the station is used to measure the jobs-housing balance.It is found that the average travel distance and duration of Shanghai passengers are higher than those of Shenzhen,and the proportion of metro commuters in Shanghai is higher than that of Shenzhen.The jobs-housing spatial relationship in Shenzhen based on metro travel is more balanced than that in Shanghai.The fundamental reason for the differences between the two cities is the difference in urban morphology.Compared with the monocentric structure of Shanghai,the polycentric structure of Shenzhen results in more scattered travel hotspots and more diverse travel routes,which helps Shenzhen to have a better jobs-housing balance.This paper fills a gap in comparative research among Chinese cities based on transportation big data analysis.The results provide support for planning metro routes,adjusting urban structure and land use to form a more reasonable metro network,and balancing the jobs-housing spatial relationship.
文摘Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current practices of evidencebased medicine,the laboratory tests analysing disease patterns through the association rule mining(ARM)have emerged as a modern tool for the risk assessment and the disease stratification,with the potential to reduce cardiovascular disease(CVD)mortality.CVDs are the well recognised leading global cause of mortality with the higher fatality rates in the Indian population due to associated factors like hypertension,diabetes,and lifestyle choices.AI-driven algorithms have offered deep insights in this field while addressing various challenges such as healthcare systems grappling with the physician shortages.Personalized medicine,well driven by the big data necessitates the integration of ML techniques and high-quality electronic health records to direct the meaningful outcome.These technological advancements enhance the computational analyses for both research and clinical practice.ARM plays a pivotal role by uncovering meaningful relationships within databases,aiding in patient survival prediction and risk factor identification.AI potential in laboratory medicine is vast and it must be cautiously integrated while considering potential ethical,legal,and privacy concerns.Thus,an AI ethics framework is essential to guide its responsible use.Aligning AI algorithms with existing lab practices,promoting education among healthcare professionals,and fostering careful integration into clinical settings are imperative for harnessing the benefits of this transformative technology.
基金Projects(10871031, 60474070) supported by the National Natural Science Foundation of ChinaProject(07A001) supported by the Scientific Research Fund of Hunan Provincial Education Department, China
文摘Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
基金Capital project for application and promotion of clinical researches(No.Z171100001017123)Capital specialized scientific research proect of health development for young excellent talents(No.2018-4-4078)。
文摘Objective:Based on data mining software,applying frequent itemsets,association rules,hierarchical clustering,complex networks and other data mining methods to analyze the usage and compatibility of traditional Chinese medicine(TCM)patent compound for functional dyspepsia.Method:Use the Chinese patent database to search the compound for the treatment of functional dyspepsia,exclude traditional Chinese medicine extracts,single drugs,combined use of Chinese and Western medicines,etc.,screen the patented compound of TCM,establish an Excel data table,and apply data mining software to The data is subjected to frequency statistics,association rules,cluster analysis and complex network analysis.Result:A total of 238 prescriptions for functional dyspepsia were screened.The four qi of the drugs were mainly warm and calm,the five flavors were mainly sweet and spicy,and the spleen and stomach were the main meridians.The top 10 Chinese medicines with higher frequency are Shanzha、Chenpi、Gancao、Maiya、Jineijin、Fuling、Baizhu、Shenqu、Houpo、Banxia;frequent itemsets show that the drugs are mainly compatible with qi and spleen,qi and digestion;association rules The analysis shows that the common drug pairs used in the treatment of functional dyspepsia include Chenpi-Shanzha、Maiya-Shanzha、Jineijin-Shanzha,etc.;cluster analysis found that there are 4 types of drugs for functional dyspepsia,mainly including drugs for regulating qi-flowing for harmonizing stomach,drugs for soothing liver and promoting Qi,drugs for eliminating food and resolving accumulation,drugs for benefiting qi and strengthening spleen;the 22-flavor Chinese medicine in the core drug network,the core compatibility is mainly to eliminate stagnation and spleen.Conclusion:Data mining research provides a reference for the clinical treatment of functional dyspepsia and the development of TCM formulas;Clinical treatment of functional dyspepsia should grasp the basic principles of strengthening vital energy and eliminating pathogenic factors to benefit qi,strengthen the spleen,and eliminate food.It is a basic treatment method,taking into account the methods of regulating qi-flowing for harmonizing stomach,soothing the liver and relieving depression,relieving dampness and dampness,and combining the specific conditions of patients with syndrome differentiation and treatment.
基金supported by the National Natural Science Foundation of China (No. J07240003, No. 60773084, No. 60603023)National Research Fund for the Doctoral Program of Higher Education of China (No. 20070151009)
文摘The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.
文摘At present, associated flow rule of traditional plastic theory is adopted in the slip line field theory and upper bound method of geotechnical materials. So the stress characteristic line conforms to the velocity line. It is proved that geotechnical materials do not abide by the associated flow rule. It is impossible for the stress characteristic line to conform to the velocity line. Generalized plastic mechanics theoretically proved that plastic potential surface intersects the Mohr-Coulomb yield surface with an angle, so that the velocity line must be studied by non-associated flow rule. According to limit analysis theory, the theory of slip line field is put forward in this paper, and then the ultimate beating capacity of strip footing is obtained based on the associated flow rule and the non-associated flow nile individually. These two results are identical since the ultimate bearing capacity is independent of flow role. On the contrary, the velocity fields of associated and non-associated flow rules are different which shows the velocity field based on the associat- ed flow rule is incorrect.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
基金Supported in part by the National Natural Science F oundation of China(6 0 0 730 12 )
文摘Extracting objects from legacy systems is a basic step in system's object orientation to improve the maintainability and understandability of the systems. A new object extraction model using association rules and dependence analysis is proposed. In this model data are classified by association rules and the corresponding operations are partitioned by dependence analysis.
基金The APC was funded by the Deanship of Scientific Research,Saudi Electronic University.
文摘Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.
基金supported by the Fundamental Research Funds for the Central Universities under Grants No.ZYGX2014J051 and No.ZYGX2014J066Science and Technology Projects in Sichuan Province under Grants No.2015JY0178,No.2016FZ0002,No.2014GZ0109,No.2015KZ002 and No.2015JY0030China Postdoctoral Science Foundation under Grant No.2015M572464
文摘At present, most of the association rules algorithms are based on the Boolean attribute and single-level association rules mining. But data of the real world has various types, the multi-level and quantitative attributes are got more and more attention. And the most important step is to mine frequent sets. In this paper, we propose an algorithm that is called fuzzy multiple-level association (FMA) rules to mine frequent sets. It is based on the improved Eclat algorithm that is different to many researchers’ proposed algorithms thatused the Apriori algorithm. We analyze quantitative data’s frequent sets by using the fuzzy theory, dividing the hierarchy of concept and softening the boundary of attributes’ values and frequency. In this paper, we use the vertical-style data and the improved Eclat algorithm to describe the proposed method, we use this algorithm to analyze the data of Beijing logistics route. Experiments show that the algorithm has a good performance, it has better effectiveness and high efficiency.
基金Supported by the National Natural Science Foun-dation of China(70371015) and the Science Foundation of JiangsuUniversity ( 04KJD001)
文摘Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.
文摘We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algorithm of association rules and apply to detect anomaly in IDS. Moreover, according to the characteristic of intrusion detection, we optimize the mining algorithm of association rules, and use fuzzy logic to improve the system performance.