To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while ...Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.展开更多
In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM...In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.展开更多
The cytology of 130 indeterminate nodules (Thy 3) was retrospectively reviewed according to the British Thyroid Association 2014 classification. Nodules were divided into Thy 3a (atypical features) and Thy 3f (fo...The cytology of 130 indeterminate nodules (Thy 3) was retrospectively reviewed according to the British Thyroid Association 2014 classification. Nodules were divided into Thy 3a (atypical features) and Thy 3f (follicular lesion) categories. Histology was available as a reference for 97 nodules. Pre-surgical evaluations comprised biochemical tests, color-Doppler ultrasonogrephy (US), semi-quantitative elastography-US (USE), contrast-enhanced US (CEUS), and mutation analysis from cytological slides. Thyroid malignancy was the final diagnosis for 19% of surgically- treated nodules. No statistically significant difference in the risk of malignancy was found between Thy 3a (26%) and Thy 3f (14%) nodules. Histology of the Thy 3a and Thy 3f nodules showed a higher incidence of Hurtle cell adenomas in Thy 3f (29%) than in Thy 3a (3%) nodules (P=0.01). The only pre-surgical difference concerned the BRAF V600E mutation, which was positive in some Thy 3a but not in any Thy 3f nodules (P=0.04). Receiver-operating characteristic (ROC) analysis was used to obtain cut-off values from US (score), USE (ELX 2/1 strain index), and CEUS (time-to- peak index and peak index) data. The cut-off values were similar for Thy 3a and Thy 3f nodules. Data showed that malignancy can be suspected if the US score is 〉2, ELX 1/2 strain index 〉1, time-to-peakindex 〉1, and peak index 〈1. In a sub-group of 24 revised nodules (12 Thy 3a and 12 Thy 3f) with histology as a reference, the diagnostic power of cumulative pre-surgical analysis by means of US, USE, and CEUS showed high positive and negative predictive values (83% and 100%, respectively) for the presence of malignancy in Thy 3a and Thy 3f nodules. In conclusion, in our series of revised Thy 3 nodules, malignancy was low and displayed no significant differences between Thy 3a and Thy 3f categories. The use of cut-offs based on histology as a reference could reduce surgery. Our data support the conviction that, in mutation-negative Thy 3a and Thy 3f nodules, observation should be the first choice when not all instrumental results are suspect.展开更多
Classification and association rule mining are used to take decisions based on relationships between attributes and help decision makers to take correct decisions at right time. Associative classification first genera...Classification and association rule mining are used to take decisions based on relationships between attributes and help decision makers to take correct decisions at right time. Associative classification first generates class based association rules and use that generate rule set which is used to predict the class label for unseen data. The large data sets may have many null-transac- tions. A null-transaction is a transaction that does not contain any of the itemsets being examined. It is important to consider the null invariance property when selecting appropriate interesting measures in the correlation analysis. Real time data set has mixed attributes. Analyze the mixed attribute data set is not easy. Hence, the proposed work uses cosine measure to avoid the influence of null transactions during rule generation. It employs mixed-kernel probability density function (PDF) to handle continuous attributes during data analysis. It has ably to handle both nominal and continuous attributes and generates mixed attribute rule set. To explore the search space efficiently it applies Ant Colony Optimization (ACO). The public data sets are used to analyze the performance of the algorithm. The results illustrate that the support-confidence framework with a correlation measure generates more accurate simple rule set and discover more interesting rules.展开更多
Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative cla...Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative classifier with a compact set of rules(i.e.,compactness),which is easy to understand and use in decision making.This paper presents a novel approach to fuzzy associative classification(namely Gain-based Fuzzy Rule-Covering classification,GFRC),which is a fuzzy extension of an effective classifier GARC.In GFRC,two desirable strategies are introduced to enhance the compactness with accuracy.One strategy is fuzzy partitioning for data discretization to cope with the‘sharp boundary problem’,in that simulated annealing is incorporated based on the information entropy measure;the other strategy is a data-redundancy resolution coupled with the rulecovering treatment.Data experiments show that GFRC had good accuracy,and was significantly advantageous over other classifiers in compactness.Moreover,GFRC is applied to a real-world case for predicting the growth of sellers in an electronic marketplace,illustrating the classification effectiveness with linguistic rules in business decision support.展开更多
In this paper, we introduce polygene-based evolution, a novel framework for evolutionary algorithms (EAs) that features distinctive operations in the evolutionary process. In traditional EAs, the primitive evolution...In this paper, we introduce polygene-based evolution, a novel framework for evolutionary algorithms (EAs) that features distinctive operations in the evolutionary process. In traditional EAs, the primitive evolution unit is a gene, wherein genes are independent components during evolution. In polygene-based evolutionary algorithms (PGEAs), the evolution unit is a polygene, i.e., a set of co-regulated genes. Discovering and maintaining quality polygenes can play an effective role in evolving quality individuals. Polygenes generalize genes, and PGEAs generalize EAs. Implementing the PGEA framework involves three phases: (Ⅰ) polygene discovery, (Ⅱ) polygene planting, and (Ⅲ) polygene-compatible evolution. For Phase I, we adopt an associative classificationbased approach to discover quality polygenes. For Phase Ⅱ, we perform probabilistic planting to maintain the diversity of individuals. For Phase Ⅲ, we incorporate polygenecompatible crossover and mutation in producing the next generation of individuals. Extensive experiments on function optimization benchmarks in comparison with the conventional and state-of-the-art EAs demonstrate the potential of the approach in terms of accuracy and efficiency improvement.展开更多
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
基金The APC was funded by the Deanship of Scientific Research,Saudi Electronic University.
文摘Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.
基金Supported by the National High Technology Research and Development Program of China (No. 2007AA01Z132) the National Natural Science Foundation of China (No.60775035, 60933004, 60970088, 60903141)+1 种基金 the National Basic Research Priorities Programme (No. 2007CB311004) the National Science and Technology Support Plan (No.2006BAC08B06).
文摘In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.
文摘The cytology of 130 indeterminate nodules (Thy 3) was retrospectively reviewed according to the British Thyroid Association 2014 classification. Nodules were divided into Thy 3a (atypical features) and Thy 3f (follicular lesion) categories. Histology was available as a reference for 97 nodules. Pre-surgical evaluations comprised biochemical tests, color-Doppler ultrasonogrephy (US), semi-quantitative elastography-US (USE), contrast-enhanced US (CEUS), and mutation analysis from cytological slides. Thyroid malignancy was the final diagnosis for 19% of surgically- treated nodules. No statistically significant difference in the risk of malignancy was found between Thy 3a (26%) and Thy 3f (14%) nodules. Histology of the Thy 3a and Thy 3f nodules showed a higher incidence of Hurtle cell adenomas in Thy 3f (29%) than in Thy 3a (3%) nodules (P=0.01). The only pre-surgical difference concerned the BRAF V600E mutation, which was positive in some Thy 3a but not in any Thy 3f nodules (P=0.04). Receiver-operating characteristic (ROC) analysis was used to obtain cut-off values from US (score), USE (ELX 2/1 strain index), and CEUS (time-to- peak index and peak index) data. The cut-off values were similar for Thy 3a and Thy 3f nodules. Data showed that malignancy can be suspected if the US score is 〉2, ELX 1/2 strain index 〉1, time-to-peakindex 〉1, and peak index 〈1. In a sub-group of 24 revised nodules (12 Thy 3a and 12 Thy 3f) with histology as a reference, the diagnostic power of cumulative pre-surgical analysis by means of US, USE, and CEUS showed high positive and negative predictive values (83% and 100%, respectively) for the presence of malignancy in Thy 3a and Thy 3f nodules. In conclusion, in our series of revised Thy 3 nodules, malignancy was low and displayed no significant differences between Thy 3a and Thy 3f categories. The use of cut-offs based on histology as a reference could reduce surgery. Our data support the conviction that, in mutation-negative Thy 3a and Thy 3f nodules, observation should be the first choice when not all instrumental results are suspect.
文摘Classification and association rule mining are used to take decisions based on relationships between attributes and help decision makers to take correct decisions at right time. Associative classification first generates class based association rules and use that generate rule set which is used to predict the class label for unseen data. The large data sets may have many null-transac- tions. A null-transaction is a transaction that does not contain any of the itemsets being examined. It is important to consider the null invariance property when selecting appropriate interesting measures in the correlation analysis. Real time data set has mixed attributes. Analyze the mixed attribute data set is not easy. Hence, the proposed work uses cosine measure to avoid the influence of null transactions during rule generation. It employs mixed-kernel probability density function (PDF) to handle continuous attributes during data analysis. It has ably to handle both nominal and continuous attributes and generates mixed attribute rule set. To explore the search space efficiently it applies Ant Colony Optimization (ACO). The public data sets are used to analyze the performance of the algorithm. The results illustrate that the support-confidence framework with a correlation measure generates more accurate simple rule set and discover more interesting rules.
基金the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities(12JJD630001)the National Natural Science Foundation of China(71372044/71110107027)Tsinghua University Initiative Scientific Research Program(20101081741).
文摘Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative classifier with a compact set of rules(i.e.,compactness),which is easy to understand and use in decision making.This paper presents a novel approach to fuzzy associative classification(namely Gain-based Fuzzy Rule-Covering classification,GFRC),which is a fuzzy extension of an effective classifier GARC.In GFRC,two desirable strategies are introduced to enhance the compactness with accuracy.One strategy is fuzzy partitioning for data discretization to cope with the‘sharp boundary problem’,in that simulated annealing is incorporated based on the information entropy measure;the other strategy is a data-redundancy resolution coupled with the rulecovering treatment.Data experiments show that GFRC had good accuracy,and was significantly advantageous over other classifiers in compactness.Moreover,GFRC is applied to a real-world case for predicting the growth of sellers in an electronic marketplace,illustrating the classification effectiveness with linguistic rules in business decision support.
基金The authors would like to thank Prof. Xin Yao for discussions and advice on this manuscript. This research was supported in part by the NSFC Joint Fund with Guangdong of China under Key Project (U 1201258), the National Natural Science Foundation of China (Grant Nos. 71402083, 61573219, 61502258) and the National Science Foundation of Shandong Province (ZR2014FQ007).
文摘In this paper, we introduce polygene-based evolution, a novel framework for evolutionary algorithms (EAs) that features distinctive operations in the evolutionary process. In traditional EAs, the primitive evolution unit is a gene, wherein genes are independent components during evolution. In polygene-based evolutionary algorithms (PGEAs), the evolution unit is a polygene, i.e., a set of co-regulated genes. Discovering and maintaining quality polygenes can play an effective role in evolving quality individuals. Polygenes generalize genes, and PGEAs generalize EAs. Implementing the PGEA framework involves three phases: (Ⅰ) polygene discovery, (Ⅱ) polygene planting, and (Ⅲ) polygene-compatible evolution. For Phase I, we adopt an associative classificationbased approach to discover quality polygenes. For Phase Ⅱ, we perform probabilistic planting to maintain the diversity of individuals. For Phase Ⅲ, we incorporate polygenecompatible crossover and mutation in producing the next generation of individuals. Extensive experiments on function optimization benchmarks in comparison with the conventional and state-of-the-art EAs demonstrate the potential of the approach in terms of accuracy and efficiency improvement.