This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to ...This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm.Fuzzy c-means algorithm can embody the actual distribution of the data,and fuzzy sets can soften the partition boundary....Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm.Fuzzy c-means algorithm can embody the actual distribution of the data,and fuzzy sets can soften the partition boundary.Then,we improve the search technology of apriori algorithm and present the algorithm for mining fuzzy association rules.As the database size becomes larger and larger,a better way is to mine fuzzy association rules in parallel.In the parallel mining algorithm,quantitative attributes are partitioned into several fuzzy sets by using parallel fuzzy c-means algorithm.Boolean parallel algorithm is improved to discover frequent fuzzy attribute set,and the fuzzy association rules with at least a minimum confidence are generated on all processors.The experiment results implemented on the distributed linked PC/workstation show that the parallel mining algorithm has fine scaleup,sizeup and speedup.Last,we discuss the application of fuzzy association rules in the classification.The example shows that the accuracy of classification systems of the fuzzy association rules is better than that of the two popular classification methods:C4.5 and CBA.展开更多
One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques t...One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.展开更多
HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in l...HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore har...The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore harm their business. Thus, the task of extracting and classifying the useful information efficiently and effectively from huge amounts of computational data is of special importance. In this paper, we consider that the attributes of data could be both crisp and fuzzy. By examining the suitable partial data, segments with different classes are formed, then a multithreaded computation is performed to generate crisp rules (if possible), and finally, the fuzzy partition technique is employed to deal with the fuzzy attributes for classification. The rules generated in classifying the overall data can be used to gain more knowledge from the data collected.展开更多
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of...The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases.展开更多
In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally ...In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.展开更多
模糊分类关联规则(Fuzzy Classification Association Rules,FCAR)是一种特殊的模糊关联规则,挖掘FCAR对于构建基于规则的分类模型至关重要。传统关联规则挖掘算法挖掘FCAR时可能会包含较多冗余规则,并且在数据集类别不平衡时,挖掘到的...模糊分类关联规则(Fuzzy Classification Association Rules,FCAR)是一种特殊的模糊关联规则,挖掘FCAR对于构建基于规则的分类模型至关重要。传统关联规则挖掘算法挖掘FCAR时可能会包含较多冗余规则,并且在数据集类别不平衡时,挖掘到的小类规则的数量会急剧减少甚至降为0。为解决上述问题,提出了一种基于特征选择和模糊类支持度-模糊提升度框架(Fuzzy Category Support-Fuzzy Lift Framework,FCS-FLF)的FCAR挖掘算法FSFCS Based FCARMiner(Feature Selection and Fuzzy Category Support-Fuzzy Lift Framework Based FCAR-Miner),基于模糊隶属度矩阵迭代挖掘FCAR。在多个类别不平衡的数据集上的实验结果表明,相比其他算法FSFCS Based FCAR-Miner算法能够避免大量冗余规则的生成,同时也能适应数据类别不平衡的情况,不会出现各类规则数量相差悬殊的情况。展开更多
文摘This paper defines a new kind of rule, probability functional dependency rule. The functional dependency degree can be depicted by this kind of rule. Five algorithms, from the simple to the complex, are presefited to mine this kind of rule in different condition. The related theorems are proved to ensure the high efficiency and the correctness of the above algorithms.
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
基金supported by the National Key Basic Research Program 973(2002CB312000)National Natural Science Funds for Distinguished Young Scholar(60425206)Advanced Armament Research Project(51406020105JB8103).
文摘Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm.Fuzzy c-means algorithm can embody the actual distribution of the data,and fuzzy sets can soften the partition boundary.Then,we improve the search technology of apriori algorithm and present the algorithm for mining fuzzy association rules.As the database size becomes larger and larger,a better way is to mine fuzzy association rules in parallel.In the parallel mining algorithm,quantitative attributes are partitioned into several fuzzy sets by using parallel fuzzy c-means algorithm.Boolean parallel algorithm is improved to discover frequent fuzzy attribute set,and the fuzzy association rules with at least a minimum confidence are generated on all processors.The experiment results implemented on the distributed linked PC/workstation show that the parallel mining algorithm has fine scaleup,sizeup and speedup.Last,we discuss the application of fuzzy association rules in the classification.The example shows that the accuracy of classification systems of the fuzzy association rules is better than that of the two popular classification methods:C4.5 and CBA.
基金support from Taif University Researchers supporting Project Number(TURSP-2020/215),Taif University,Taif,Saudi Arabia.
文摘One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.
文摘HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
文摘The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore harm their business. Thus, the task of extracting and classifying the useful information efficiently and effectively from huge amounts of computational data is of special importance. In this paper, we consider that the attributes of data could be both crisp and fuzzy. By examining the suitable partial data, segments with different classes are formed, then a multithreaded computation is performed to generate crisp rules (if possible), and finally, the fuzzy partition technique is employed to deal with the fuzzy attributes for classification. The rules generated in classifying the overall data can be used to gain more knowledge from the data collected.
文摘The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases.
基金This work was supported in part by the National '863' High-Tech Programme of China !(No.863-306-ZD06-2)
文摘In this paper, the problem of discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BitMatrix, is proposed. The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid. Empirical evaluation shows that the algorithm outperforms the known ones for large databases. Scale-up experiments show that the algorithm scales linearly with the number of transactions.
文摘模糊分类关联规则(Fuzzy Classification Association Rules,FCAR)是一种特殊的模糊关联规则,挖掘FCAR对于构建基于规则的分类模型至关重要。传统关联规则挖掘算法挖掘FCAR时可能会包含较多冗余规则,并且在数据集类别不平衡时,挖掘到的小类规则的数量会急剧减少甚至降为0。为解决上述问题,提出了一种基于特征选择和模糊类支持度-模糊提升度框架(Fuzzy Category Support-Fuzzy Lift Framework,FCS-FLF)的FCAR挖掘算法FSFCS Based FCARMiner(Feature Selection and Fuzzy Category Support-Fuzzy Lift Framework Based FCAR-Miner),基于模糊隶属度矩阵迭代挖掘FCAR。在多个类别不平衡的数据集上的实验结果表明,相比其他算法FSFCS Based FCAR-Miner算法能够避免大量冗余规则的生成,同时也能适应数据类别不平衡的情况,不会出现各类规则数量相差悬殊的情况。