Water vapor permeability of building materials is a crucial parameter for analysing and optimizing the hygrothermal performance of building envelopes and built environments.Its measurement is accurate but time-consumi...Water vapor permeability of building materials is a crucial parameter for analysing and optimizing the hygrothermal performance of building envelopes and built environments.Its measurement is accurate but time-consuming,while data mining methods have the potential to predict water vapor permeability efficiently.In this study,six data mining methods—support vector regression(SVR),decision tree regression(DT),random forest regression(RF),K-nearest neighbor(KNN),multi-layer perceptron(MLP),and adaptive boosting regression(AdaBoost)—were compared to predict the water vapor permeability of cement-based materials.A total of 143 datasets of material properties were collected to build prediction models,and five materials were experimentally determined for model validation.The results show that RF has excellent generalization,stability,and precision.AdaBoost has great generalization and precision,only slightly inferior to the former,and its stability is excellent.DT has good precision and acceptable generalization,but its stability is poor.SVR and KNN have superior stability,but their generalization and precision are inadequate.MLP lacks generalization,and its stability and precision are unacceptable.In short,RF has the best comprehensive performance,demonstrated by a limited prediction deviation of 26.3%from the experimental results,better than AdaBoost(38.0%)and DT(38.3%)and far better than other remaining methods.It is also found that data mining methods provide better predictions when cement-based materials’water vapor permeability is high.展开更多
Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for ...Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.展开更多
Objective: To provide the distribution pattern and compatibility laws of the constituent herbs in prescriptions, for doctor's convenience to make decision in choosing correct herbs and prescriptions for treating res...Objective: To provide the distribution pattern and compatibility laws of the constituent herbs in prescriptions, for doctor's convenience to make decision in choosing correct herbs and prescriptions for treating respiratory disease. Methods: Classical prescriptions treating respiratory disease were selected from authoritative prescription books. Data mining methods (frequent itemsets and association rules) were used to analyze the regular patterns and compatibility laws of the constituent herbs in the selected prescriptions. Results: A total of 562 prescriptions were selected to be studied. The result exhibited that, Radix g/ycyrrhizae was the most frequently used in 47.2% prescriptions, other frequently used were Semen armeniacae amarum, Fructus schisandrae Chinese, Herba ephedrae, and Radix ginseng. Herbal ephedrae was always coupled with Semen armeniacae amarum with the confidence of 73.3%, and many herbs were always accompanied by Radix g/ycyrrhizae with high confidence. More over, Fructus schisandrae Chinese, Herba ephedrae and Rhizoma pinelliae was most commonly used to treat cough, dyspnoea and associated sputum respectively besides Radix glycyrrhizae and Semen armeniacae amarum. The prescriptions treating dyspnoea often used double herb group of Herba ephedrae & Radix glycyrrhizae, while prescriptions treating sputum often used double herb group of Rhizoma pinel/iae & Radix glycyrrhizae and Rhizoma pinelliae & Semen armeniacae amarum, triple herb groups of Rhizoma pinelliae & Semen armeniacae amarum & Radix glycyrrhizae and Pericarpium citri reticu/atae & Rhizoma pine/liae & Radix glycyrrhizae. Couclusioas: The prescriptions treating respiratory disease showed common compatibility laws in using herbs and special compatibility laws for treating different respiratory symptoms. These principle patterns and special compatibility laws reported here could be useful for doctors to choose correct herbs and prescriptions in treating respiratory disease.展开更多
Data mining enables us to form forecasts and models regarding future by making use of past data. Any method which helps to discover data can be used as a data mining method. Enterprises gain important competitive adva...Data mining enables us to form forecasts and models regarding future by making use of past data. Any method which helps to discover data can be used as a data mining method. Enterprises gain important competitive advantage by data mining methods. Data mining is used in different fields. In finance field, it is a specially used in portfolio management, fraud detection, payment prediction, loan risk analysis, mortgage scoring, determining transaction manipulation, determining financial risk management, determining customer profile and foreign exchange market. It can be costly, risky and time consuming for enterprises to gain knowledge. Thus today enterprises use data mining as an innovative competitive mean. The aim of the study is to determine the importance of data mining in financial applications.展开更多
Background Hepatitis C virus(HCV)has a high prevalence worldwide,and the progression of the disease can cause irreversible damage to severe liver damage or even death.Therefore,developing prediction models using machi...Background Hepatitis C virus(HCV)has a high prevalence worldwide,and the progression of the disease can cause irreversible damage to severe liver damage or even death.Therefore,developing prediction models using machine learning techniques is beneficial.This study was conducted to classify suspected patients with HCV infection using different classification models.Methods The study was conducted using a dataset derived from the University of California,Irvine(UCI)Ma-chine Learning Repository.Since the HCV dataset was imbalanced,the synthetic minority oversampling technique(SMOTE)was applied to balance the dataset.After cleaning the dataset,it was divided into training and test data for developing six classification models.These six algorithms included the support vector machine(SVM),Gaus-sian Naïve Bayes(NB),decision tree(DT),random forest(RF),logistic regression(LR),and K-nearest neighbors(KNN)algorithm.The Python programming language was used to develop the classifiers.Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.Results After the evaluation of the models using different metrics,the RF classifier had the best performance among the six methods.The accuracy of the RF classifier was 97.29%.Accordingly,the area under the curve(AUC)for LR,KNN,DT,SVM,Gaussian NB,and RF models were 0.921,0.963,0.953,0.972,0.896,and 0.998,respectively,RF showing the best predictive performance.Conclusion Various machine learning techniques for classifying healthy and unhealthy patients were used in this study.Additionally,the developed models might identify the stage of HCV based on trained data.展开更多
基金supported by the National Natural Science Foundation of China (No.52178065).
文摘Water vapor permeability of building materials is a crucial parameter for analysing and optimizing the hygrothermal performance of building envelopes and built environments.Its measurement is accurate but time-consuming,while data mining methods have the potential to predict water vapor permeability efficiently.In this study,six data mining methods—support vector regression(SVR),decision tree regression(DT),random forest regression(RF),K-nearest neighbor(KNN),multi-layer perceptron(MLP),and adaptive boosting regression(AdaBoost)—were compared to predict the water vapor permeability of cement-based materials.A total of 143 datasets of material properties were collected to build prediction models,and five materials were experimentally determined for model validation.The results show that RF has excellent generalization,stability,and precision.AdaBoost has great generalization and precision,only slightly inferior to the former,and its stability is excellent.DT has good precision and acceptable generalization,but its stability is poor.SVR and KNN have superior stability,but their generalization and precision are inadequate.MLP lacks generalization,and its stability and precision are unacceptable.In short,RF has the best comprehensive performance,demonstrated by a limited prediction deviation of 26.3%from the experimental results,better than AdaBoost(38.0%)and DT(38.3%)and far better than other remaining methods.It is also found that data mining methods provide better predictions when cement-based materials’water vapor permeability is high.
基金supported by Scientific Research Special Project of TCM Profession (200907001E)Science and Technology Special Major Project for "Significant New Drugs Formulation" (2009ZX09301-005-02)
文摘Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.
基金The Chinese Journal of Integrated Traditional and Western Medicine Press and Springer-Verlag Berlin Heidelberg 2012 *Supported by the Major State Basic Research Development Program of China (973 Program, No. 2007CB512601)
文摘Objective: To provide the distribution pattern and compatibility laws of the constituent herbs in prescriptions, for doctor's convenience to make decision in choosing correct herbs and prescriptions for treating respiratory disease. Methods: Classical prescriptions treating respiratory disease were selected from authoritative prescription books. Data mining methods (frequent itemsets and association rules) were used to analyze the regular patterns and compatibility laws of the constituent herbs in the selected prescriptions. Results: A total of 562 prescriptions were selected to be studied. The result exhibited that, Radix g/ycyrrhizae was the most frequently used in 47.2% prescriptions, other frequently used were Semen armeniacae amarum, Fructus schisandrae Chinese, Herba ephedrae, and Radix ginseng. Herbal ephedrae was always coupled with Semen armeniacae amarum with the confidence of 73.3%, and many herbs were always accompanied by Radix g/ycyrrhizae with high confidence. More over, Fructus schisandrae Chinese, Herba ephedrae and Rhizoma pinelliae was most commonly used to treat cough, dyspnoea and associated sputum respectively besides Radix glycyrrhizae and Semen armeniacae amarum. The prescriptions treating dyspnoea often used double herb group of Herba ephedrae & Radix glycyrrhizae, while prescriptions treating sputum often used double herb group of Rhizoma pinel/iae & Radix glycyrrhizae and Rhizoma pinelliae & Semen armeniacae amarum, triple herb groups of Rhizoma pinelliae & Semen armeniacae amarum & Radix glycyrrhizae and Pericarpium citri reticu/atae & Rhizoma pine/liae & Radix glycyrrhizae. Couclusioas: The prescriptions treating respiratory disease showed common compatibility laws in using herbs and special compatibility laws for treating different respiratory symptoms. These principle patterns and special compatibility laws reported here could be useful for doctors to choose correct herbs and prescriptions in treating respiratory disease.
文摘Data mining enables us to form forecasts and models regarding future by making use of past data. Any method which helps to discover data can be used as a data mining method. Enterprises gain important competitive advantage by data mining methods. Data mining is used in different fields. In finance field, it is a specially used in portfolio management, fraud detection, payment prediction, loan risk analysis, mortgage scoring, determining transaction manipulation, determining financial risk management, determining customer profile and foreign exchange market. It can be costly, risky and time consuming for enterprises to gain knowledge. Thus today enterprises use data mining as an innovative competitive mean. The aim of the study is to determine the importance of data mining in financial applications.
文摘Background Hepatitis C virus(HCV)has a high prevalence worldwide,and the progression of the disease can cause irreversible damage to severe liver damage or even death.Therefore,developing prediction models using machine learning techniques is beneficial.This study was conducted to classify suspected patients with HCV infection using different classification models.Methods The study was conducted using a dataset derived from the University of California,Irvine(UCI)Ma-chine Learning Repository.Since the HCV dataset was imbalanced,the synthetic minority oversampling technique(SMOTE)was applied to balance the dataset.After cleaning the dataset,it was divided into training and test data for developing six classification models.These six algorithms included the support vector machine(SVM),Gaus-sian Naïve Bayes(NB),decision tree(DT),random forest(RF),logistic regression(LR),and K-nearest neighbors(KNN)algorithm.The Python programming language was used to develop the classifiers.Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.Results After the evaluation of the models using different metrics,the RF classifier had the best performance among the six methods.The accuracy of the RF classifier was 97.29%.Accordingly,the area under the curve(AUC)for LR,KNN,DT,SVM,Gaussian NB,and RF models were 0.921,0.963,0.953,0.972,0.896,and 0.998,respectively,RF showing the best predictive performance.Conclusion Various machine learning techniques for classifying healthy and unhealthy patients were used in this study.Additionally,the developed models might identify the stage of HCV based on trained data.