Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of th...Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.展开更多
Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider neg...Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.展开更多
The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table techni...The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in l...HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.展开更多
In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers...In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.展开更多
Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined fr...Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined from audit logs. Some attributes in audit logs are important for detecting intrusion but their values are distributed skewedly. A relative support concept is proposed to deal with such situation. To mine class association rules effectively, an algorithms based on FP-tree is exploited. Experiment result proves that this method has better performance.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain a...Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.展开更多
With the development of smart agriculture,the accumulation of data in the field of pesticide regulation has a certain scale.The pesticide transaction data collected by the Pesticide National Data Center alone produces...With the development of smart agriculture,the accumulation of data in the field of pesticide regulation has a certain scale.The pesticide transaction data collected by the Pesticide National Data Center alone produces more than 10 million records daily.However,due to the backward technical means,the existing pesticide supervision data lack deep mining and usage.The Apriori algorithm is one of the classic algorithms in association rule mining,but it needs to traverse the transaction database multiple times,which will cause an extra IO burden.Spark is an emerging big data parallel computing framework with advantages such as memory computing and flexible distributed data sets.Compared with the Hadoop MapReduce computing framework,IO performance was greatly improved.Therefore,this paper proposed an improved Apriori algorithm based on Spark framework,ICAMA.The MapReduce process was used to support the candidate set and then to generate the candidate set.After experimental comparison,when the data volume exceeds 250 Mb,the performance of Spark-based Apriori algorithm was 20%higher than that of the traditional Hadoop-based Apriori algorithm,and with the increase of data volume,the performance improvement was more obvious.展开更多
The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) i...The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the tre...Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the treatment of chronic hepatitis B infection were gathered from various databases,including SinoMed,Chongqing Vip,China National Knowledge Infrastructure,Wanfang,the Cochrane Library,PubMed,Web of Science and Embase.Excel 2019 was utilized to establish a database of acupuncture prescriptions and conduct statistics on the frequency,meridian application,distribution and specific points,as well as SPSS Modeler 18.0 and SPSS Statistics 26.0 to conduct association rule analysis and cluster analysis to investigate the characteristics and patterns of acupoint selection.Results:A total of 42 studies containing 47 acupoints were included,with a total frequency of 286 acupoints.The top five acupoints used were Zusanli(ST36),Ganshu(BL18),Yanglingquan(GB34),Sanyinjiao(SP6)and Taichong(LR3),and the most commonly used meridians was the Bladder Meridian of Foot-Taiyang.The majority of acupuncture points are located in the lower limbs,back,and lumbar regions,with a significant percentage of them being Five-Shu acupoints.The strongest acupoint combination identified was Zusanli(ST36)–Ganshu(BL18),in addition to which 13 association rules and 4 valid clusters were obtained.Conclusion:Zusanli(ST36)–Ganshu(BL18)could be considered a relatively reasonable prescription for treating chronic hepatitis B infection in clinical practice.However,further high-quality studies are needed.展开更多
Objective: Based on data mining, to explore the medication rules of Chinese medicine for the treatment of restless legs syndrome(RLS). Methods: The CNKI, WANFANG, and VIP were taken as data sources, and "restless...Objective: Based on data mining, to explore the medication rules of Chinese medicine for the treatment of restless legs syndrome(RLS). Methods: The CNKI, WANFANG, and VIP were taken as data sources, and "restless legs syndrome, RLS" as the key words, and "Chinese medicine, Chinese materia medica, traditional Chinese medicine(TCM), traditional Chinese and Western medicine" as sub key words, the data was extracted from the journals and literature related to the treatment of RLS by TCM from the establishment of the database to 2020, and data mining techniques(frequency analysis, cluster analysis, association rules) were used to analyze the core drugs and drug pair(group) rules. Results: A total of 87 prescriptions met the requirements of this study, involving 142 Chinese herbal medicines. The top 5 Chinese herbal medicines with a higher frequency of use were Radix Paeoniae Alba, Radix Glycyrrhizae, Radix Angelicae Sinensis, Fructus Chaenomelis and Radix Astragali seu Hedysari. The four Qi(气) of the medicine were mainly warm and neutral, the five flavors were mainly sweet, bitter, and pungent. The main channels of the meridian were mainly the liver meridian, spleen meridian and heart meridian. The medication categories were mainly tonifying deficiency herbs, blood activating and removing blood stasis herbs, and eliminating wind and dampness herbs. The association rule analysis yielded 24 Chinese medicine combinations with high support, and the hierarchical cluster analysis yielded a total of 5 clusters. Conclusion: TCM treatment of RLS is based on tonifying deficiency herbs, especially to replenish Qi and blood throughout the course of the disease, supplemented by herbs for promoting blood circulation and removing blood stasis, and herbs for eliminating wind and dampness, as well as combined with herbs for reliving superficies and herbs for calming the liver to stop the wind.展开更多
In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally ...In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.展开更多
Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic ...Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic database of clinical literature on the treatment of sleep disorders after stroke by TCM from January 2000 to January 2021.Excel was used to establish the database,and the prescription information was described and analyzed statistically.Using IBM SPSS Modeler 18.0 software,Apriori algorithm was used for TCM association analysis,and IBM SPSS 22.0 software was used for systematic cluster analysis of high-frequency TCM.Results:A total of 67 literatures were included,covering 131 traditional Chinese medicines.The medecines with a higher frequency of sodium use include Ziziphi Spinosae Semen(Suanzaoren),Angelicae Sinensis Radix(Danggui),Ligusticum(Chuanxiong),liquorice(Gancao),Poria cocos(Fuling),and so on.From the effect point of view,deficiency-tonifying medicine,sedative medicine and blood-activating and stasis-removing medicine are commonly used.The medicinal properties are mainly cold,mild and warm.The main medicine flavor are sweet and bitter.The medicines mostly belong to the liver,heart and spleen Meridian.Thirty-three association rules were obtained for medicine pairs and medicine groups from the correlation analysis,and the core combinations were"Ziziphi Spinosae Semen(Suanzaoren)-Tuber fleeceflower stem(Yejiaoteng)","Ziziphi Spinosae Semen(Suanzaoren)-Polygala(Yuanzhi)","Ziziphi Spinosae Semen(Suanzaoren)-Cortex albiziae(Hehuanpi)"and"Angelicae Sinensis Radix(Danggui)-Radix bupleuri(Chaihu)-Radix Paeoniae Alba(Baishao)"and so on.Seven medicine aggregation groups were obtained by medicine cluster analysis.Conclusion:In the treatment of sleep disorder after stroke by TCM,the main method is to calm the heart and mind.Meanwhile,according to different syndrome types,the treatment methods of tonifying the heart and spleen,nourishing the liver and kidney,soothing the liver and softening the liver,clearing heat and resolving phlegm,nourishing the blood and promoting blood circulation are selected,which provide certain reference for clinical treatment.展开更多
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
文摘Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.
基金Supported by the National Natural Science Foun-dation of China(70371015) and the Science Foundation of JiangsuUniversity ( 04KJD001)
文摘Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i. e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, very few algorithms to mine them have been proposed to date. In this paper, an algorithm based on FP tree is presented to discover negative association rules.
文摘The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
文摘HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.
基金Supported in part by the National Natural Science Foundation of China(No.60073012),Natural Science Foundation of Jiangsu(BK2001004)
文摘In this letter, on the basis of Frequent Pattern(FP) tree, the support function to update FP-tree is introduced, then an Incremental FP (IFP) algorithm for mining association rules is proposed. IFP algorithm considers not only adding new data into the database but also reducing old data from the database. Furthermore, it can predigest five cases to three cases.The algorithm proposed in this letter can avoid generating lots of candidate items, and it is high efficient.
基金The work is supported by Chinese NSF(Project No.60073034)
文摘Intrusion detection is regarded as classification in data mining field. However instead of directly mining the classification rules, class association rules, which are then used to construct a classifier, are mined from audit logs. Some attributes in audit logs are important for detecting intrusion but their values are distributed skewedly. A relative support concept is proposed to deal with such situation. To mine class association rules effectively, an algorithms based on FP-tree is exploited. Experiment result proves that this method has better performance.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
文摘Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.
基金supported by National Natural Science Foundation of China(No.61601471)。
文摘With the development of smart agriculture,the accumulation of data in the field of pesticide regulation has a certain scale.The pesticide transaction data collected by the Pesticide National Data Center alone produces more than 10 million records daily.However,due to the backward technical means,the existing pesticide supervision data lack deep mining and usage.The Apriori algorithm is one of the classic algorithms in association rule mining,but it needs to traverse the transaction database multiple times,which will cause an extra IO burden.Spark is an emerging big data parallel computing framework with advantages such as memory computing and flexible distributed data sets.Compared with the Hadoop MapReduce computing framework,IO performance was greatly improved.Therefore,this paper proposed an improved Apriori algorithm based on Spark framework,ICAMA.The MapReduce process was used to support the candidate set and then to generate the candidate set.After experimental comparison,when the data volume exceeds 250 Mb,the performance of Spark-based Apriori algorithm was 20%higher than that of the traditional Hadoop-based Apriori algorithm,and with the increase of data volume,the performance improvement was more obvious.
文摘The conventional complete association rule set was replaced by the least association rule set in data warehouse association rule mining process. The least association rule set should comply with two requirements: 1) it should be the minimal and the simplest association rule set; 2) its predictive power should in no way be weaker than that of the complete association rule set so that the precision of the association rule set analysis can be guaranteed. By adopting the least association rule set, the pruning of weak rules can be effectively carried out so as to greatly reduce the number of frequent itemset, and therefore improve the mining efficiency. Finally, based on the classical Apriori algorithm, the upward closure property of weak rules is utilized to develop a corresponding efficient algorithm.
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
基金supported by Chongqing Municipal Health and Family Planning Commission and Chongqing Municipal Science and Technology Commission Jointly Funded Key Research Projects in Traditional Chinese Medicine(ZY201801007).
文摘Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the treatment of chronic hepatitis B infection were gathered from various databases,including SinoMed,Chongqing Vip,China National Knowledge Infrastructure,Wanfang,the Cochrane Library,PubMed,Web of Science and Embase.Excel 2019 was utilized to establish a database of acupuncture prescriptions and conduct statistics on the frequency,meridian application,distribution and specific points,as well as SPSS Modeler 18.0 and SPSS Statistics 26.0 to conduct association rule analysis and cluster analysis to investigate the characteristics and patterns of acupoint selection.Results:A total of 42 studies containing 47 acupoints were included,with a total frequency of 286 acupoints.The top five acupoints used were Zusanli(ST36),Ganshu(BL18),Yanglingquan(GB34),Sanyinjiao(SP6)and Taichong(LR3),and the most commonly used meridians was the Bladder Meridian of Foot-Taiyang.The majority of acupuncture points are located in the lower limbs,back,and lumbar regions,with a significant percentage of them being Five-Shu acupoints.The strongest acupoint combination identified was Zusanli(ST36)–Ganshu(BL18),in addition to which 13 association rules and 4 valid clusters were obtained.Conclusion:Zusanli(ST36)–Ganshu(BL18)could be considered a relatively reasonable prescription for treating chronic hepatitis B infection in clinical practice.However,further high-quality studies are needed.
文摘Objective: Based on data mining, to explore the medication rules of Chinese medicine for the treatment of restless legs syndrome(RLS). Methods: The CNKI, WANFANG, and VIP were taken as data sources, and "restless legs syndrome, RLS" as the key words, and "Chinese medicine, Chinese materia medica, traditional Chinese medicine(TCM), traditional Chinese and Western medicine" as sub key words, the data was extracted from the journals and literature related to the treatment of RLS by TCM from the establishment of the database to 2020, and data mining techniques(frequency analysis, cluster analysis, association rules) were used to analyze the core drugs and drug pair(group) rules. Results: A total of 87 prescriptions met the requirements of this study, involving 142 Chinese herbal medicines. The top 5 Chinese herbal medicines with a higher frequency of use were Radix Paeoniae Alba, Radix Glycyrrhizae, Radix Angelicae Sinensis, Fructus Chaenomelis and Radix Astragali seu Hedysari. The four Qi(气) of the medicine were mainly warm and neutral, the five flavors were mainly sweet, bitter, and pungent. The main channels of the meridian were mainly the liver meridian, spleen meridian and heart meridian. The medication categories were mainly tonifying deficiency herbs, blood activating and removing blood stasis herbs, and eliminating wind and dampness herbs. The association rule analysis yielded 24 Chinese medicine combinations with high support, and the hierarchical cluster analysis yielded a total of 5 clusters. Conclusion: TCM treatment of RLS is based on tonifying deficiency herbs, especially to replenish Qi and blood throughout the course of the disease, supplemented by herbs for promoting blood circulation and removing blood stasis, and herbs for eliminating wind and dampness, as well as combined with herbs for reliving superficies and herbs for calming the liver to stop the wind.
基金This work was supported in part by the National '863' High-Tech Programme of China !(No.863-306-ZD06-2)
文摘In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.
基金Beijing Science and Technology Program(No.Z191100006619065)National Key R&D Program(No.2017YFC1700101)。
文摘Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic database of clinical literature on the treatment of sleep disorders after stroke by TCM from January 2000 to January 2021.Excel was used to establish the database,and the prescription information was described and analyzed statistically.Using IBM SPSS Modeler 18.0 software,Apriori algorithm was used for TCM association analysis,and IBM SPSS 22.0 software was used for systematic cluster analysis of high-frequency TCM.Results:A total of 67 literatures were included,covering 131 traditional Chinese medicines.The medecines with a higher frequency of sodium use include Ziziphi Spinosae Semen(Suanzaoren),Angelicae Sinensis Radix(Danggui),Ligusticum(Chuanxiong),liquorice(Gancao),Poria cocos(Fuling),and so on.From the effect point of view,deficiency-tonifying medicine,sedative medicine and blood-activating and stasis-removing medicine are commonly used.The medicinal properties are mainly cold,mild and warm.The main medicine flavor are sweet and bitter.The medicines mostly belong to the liver,heart and spleen Meridian.Thirty-three association rules were obtained for medicine pairs and medicine groups from the correlation analysis,and the core combinations were"Ziziphi Spinosae Semen(Suanzaoren)-Tuber fleeceflower stem(Yejiaoteng)","Ziziphi Spinosae Semen(Suanzaoren)-Polygala(Yuanzhi)","Ziziphi Spinosae Semen(Suanzaoren)-Cortex albiziae(Hehuanpi)"and"Angelicae Sinensis Radix(Danggui)-Radix bupleuri(Chaihu)-Radix Paeoniae Alba(Baishao)"and so on.Seven medicine aggregation groups were obtained by medicine cluster analysis.Conclusion:In the treatment of sleep disorder after stroke by TCM,the main method is to calm the heart and mind.Meanwhile,according to different syndrome types,the treatment methods of tonifying the heart and spleen,nourishing the liver and kidney,soothing the liver and softening the liver,clearing heat and resolving phlegm,nourishing the blood and promoting blood circulation are selected,which provide certain reference for clinical treatment.