Data mining is the powerful technique, which can be widely used for discovering the customers’ behaviors as well as customer’s preferences. As a result, it has been widely used in top level companies for evaluating ...Data mining is the powerful technique, which can be widely used for discovering the customers’ behaviors as well as customer’s preferences. As a result, it has been widely used in top level companies for evaluating their Customer Relationship Management (CRM) system today. In this study, a new K-means clustering method proposed to evaluate the cluster customers’ profitability in telecommunication industry in Sri Lanka. Furthermore, RFM model mainly used as an input variable for K-means clustering and distortion curve used to identify optimal number of initial clusters. Based on the results, telecommunication customers’ profitability in Sri Lanka mainly categorized into three levels.展开更多
With the advent of the era of big data and the development and construction of smart campuses,the campus is gradually moving towards digitalization,networking and informationization.The campus card is an important par...With the advent of the era of big data and the development and construction of smart campuses,the campus is gradually moving towards digitalization,networking and informationization.The campus card is an important part of the construction of a smart campus,and the massive data it generates can indirectly reflect the living conditions of students at school.In the face of the campus card,how to quickly and accurately obtain the information required by users from the massive data sets has become an urgent problem that needs to be solved.This paper proposes a data mining algorithm based on K-Means clustering and time series.It analyzes the consumption data of a college student’s card to deeply mine and analyze the daily life consumer behavior habits of students,and to make an accurate judgment on the specific life consumer behavior.The algorithm proposed in this paper provides a practical reference for the construction of smart campuses in universities,and has important theoretical and application values.展开更多
Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic ...Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic database of clinical literature on the treatment of sleep disorders after stroke by TCM from January 2000 to January 2021.Excel was used to establish the database,and the prescription information was described and analyzed statistically.Using IBM SPSS Modeler 18.0 software,Apriori algorithm was used for TCM association analysis,and IBM SPSS 22.0 software was used for systematic cluster analysis of high-frequency TCM.Results:A total of 67 literatures were included,covering 131 traditional Chinese medicines.The medecines with a higher frequency of sodium use include Ziziphi Spinosae Semen(Suanzaoren),Angelicae Sinensis Radix(Danggui),Ligusticum(Chuanxiong),liquorice(Gancao),Poria cocos(Fuling),and so on.From the effect point of view,deficiency-tonifying medicine,sedative medicine and blood-activating and stasis-removing medicine are commonly used.The medicinal properties are mainly cold,mild and warm.The main medicine flavor are sweet and bitter.The medicines mostly belong to the liver,heart and spleen Meridian.Thirty-three association rules were obtained for medicine pairs and medicine groups from the correlation analysis,and the core combinations were"Ziziphi Spinosae Semen(Suanzaoren)-Tuber fleeceflower stem(Yejiaoteng)","Ziziphi Spinosae Semen(Suanzaoren)-Polygala(Yuanzhi)","Ziziphi Spinosae Semen(Suanzaoren)-Cortex albiziae(Hehuanpi)"and"Angelicae Sinensis Radix(Danggui)-Radix bupleuri(Chaihu)-Radix Paeoniae Alba(Baishao)"and so on.Seven medicine aggregation groups were obtained by medicine cluster analysis.Conclusion:In the treatment of sleep disorder after stroke by TCM,the main method is to calm the heart and mind.Meanwhile,according to different syndrome types,the treatment methods of tonifying the heart and spleen,nourishing the liver and kidney,soothing the liver and softening the liver,clearing heat and resolving phlegm,nourishing the blood and promoting blood circulation are selected,which provide certain reference for clinical treatment.展开更多
Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers are...Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers aren’t getting the most out of their land because they don’t use precision agriculture.They harvest crops without a well-planned recommendation system.Future crop production is calculated by combining environmental conditions and management behavior,yielding numerical and categorical data.Most existing research still needs to address data preprocessing and crop categorization/classification.Furthermore,statistical analysis receives less attention,despite producing more accurate and valid results.The study was conducted on a dataset about Karnataka state,India,with crops of eight parameters taken into account,namely the minimum amount of fertilizers required,such as nitrogen,phosphorus,potassium,and pH values.The research considers rainfall,season,soil type,and temperature parameters to provide precise cultivation recommendations for high productivity.The presented algorithm converts discrete numerals to factors first,then reduces levels.Second,the algorithm generates six datasets,two fromCase-1(dataset withmany numeric variables),two from Case-2(dataset with many categorical variables),and one from Case-3(dataset with reduced factor variables).Finally,the algorithm outputs a class membership allocation based on an extended version of the K-means partitioning method with lambda estimation.The presented work produces mixed-type datasets with precisely categorized crops by organizing data based on environmental conditions,soil nutrients,and geo-location.Finally,the prepared dataset solves the classification problem,leading to a model evaluation that selects the best dataset for precise crop prediction.展开更多
Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can...Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.展开更多
Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the tre...Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the treatment of chronic hepatitis B infection were gathered from various databases,including SinoMed,Chongqing Vip,China National Knowledge Infrastructure,Wanfang,the Cochrane Library,PubMed,Web of Science and Embase.Excel 2019 was utilized to establish a database of acupuncture prescriptions and conduct statistics on the frequency,meridian application,distribution and specific points,as well as SPSS Modeler 18.0 and SPSS Statistics 26.0 to conduct association rule analysis and cluster analysis to investigate the characteristics and patterns of acupoint selection.Results:A total of 42 studies containing 47 acupoints were included,with a total frequency of 286 acupoints.The top five acupoints used were Zusanli(ST36),Ganshu(BL18),Yanglingquan(GB34),Sanyinjiao(SP6)and Taichong(LR3),and the most commonly used meridians was the Bladder Meridian of Foot-Taiyang.The majority of acupuncture points are located in the lower limbs,back,and lumbar regions,with a significant percentage of them being Five-Shu acupoints.The strongest acupoint combination identified was Zusanli(ST36)–Ganshu(BL18),in addition to which 13 association rules and 4 valid clusters were obtained.Conclusion:Zusanli(ST36)–Ganshu(BL18)could be considered a relatively reasonable prescription for treating chronic hepatitis B infection in clinical practice.However,further high-quality studies are needed.展开更多
The academic community is currently confronting some challenges in terms of analyzing and evaluating the progress of a student’s academic performance. In the real world, classifying the performance of the students is...The academic community is currently confronting some challenges in terms of analyzing and evaluating the progress of a student’s academic performance. In the real world, classifying the performance of the students is a scientifically challenging task. Recently, some studies apply cluster analysis for evaluating the students’ results and utilize statistical techniques to part their score in regard to student’s performance. This approach, however, is not efficient. In this study, we combine two techniques, namely, k-mean and elbow clustering algorithm to evaluate the student’s performance. Based on this combination, the results of performance will be more accurate in analyzing and evaluating the progress of the student’s performance. In this study, the methodology has been implemented to define the diverse fascinating model taking the student test scores.展开更多
Objective:Use data mining techniques to explore the rule of Chinese medicine used for airway remodeling.Methods:Search the literature on Chinese medicine use for airway remodeling in the past 20 years.With the help of...Objective:Use data mining techniques to explore the rule of Chinese medicine used for airway remodeling.Methods:Search the literature on Chinese medicine use for airway remodeling in the past 20 years.With the help of WPS Office Excel 11.1,IBM SPSS Statistics 23.0 and SPSS Modeler 18.0 software,prescriptions were analyzed for the frequency of drug use,the four natures,the five flavours and the channel tropism,cluster analysis and association analysis of high-frequency drugs.Results:There were 58 Chinese medicine prescriptions for airway remodeling be found,involving 105 Chinese medicines,the most frequent channel tropism were spleen,stomach,lung,large intestine,liver and gallbladder,the most frequent use of the five flavors was sour,sweet and pungent,the highest frequency of the four natures was cold and hot,cluster analysis yielded eight drug aggregation groups,and association rule analysis yielded five groups of high-frequency drug pairs.Conclusion:The main TCM treatments for airway remodeling are expelling phlegm,relieving cough,asthma calming,expelling blood stasis and deficiency tonifying.The results of this study can provide ideas for compounding and drug selection for subsequent studies.展开更多
Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based...Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based on data mining.Furthermore,with the complexity of industrial equipment increasing,the managers should pay more attention to the key components and carry out the lean management is very important.Therefore,the idea"customer segmentation"of"precise marketing"can be used in the maintenance management of the multi-component system.Following the idea of segmentation,the components of multicomponent systems should be subdivied into groups based on specific attributes relevant to maintenance,such as maintenance cost,mean time between failures,and failure frequency.For the target specific groups of parts,the optimal maintenance policy,health assessment and maintenance scheduling can be determined.The proposed analysis framework will be given out.In order to illustrate the effectiveness of this method,a numerical example is given out.展开更多
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets...Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
The term “customer churn” is used in the industry of information and communication technology (ICT) to indicate those customers who are about to leave for a new competitor, or end their subscription. Predicting this...The term “customer churn” is used in the industry of information and communication technology (ICT) to indicate those customers who are about to leave for a new competitor, or end their subscription. Predicting this behavior is very important for real life market and competition, and it is essential to manage it. In this paper, three hybrid models are investigated to develop an accurate and efficient churn prediction model. The three models are based on two phases;the clustering phase and the prediction phase. In the first phase, customer data is filtered. The second phase predicts the customer behavior. The first model investigates the k-means algorithm for data filtering, and Multilayer Perceptron Artificial Neural Networks (MLP-ANN) for prediction. The second model uses hierarchical clustering with MLP-ANN. The third one uses self organizing maps (SOM) with MLP-ANN. The three models are developed based on real data then the accuracy and churn rate values are calculated and compared. The comparison with the other models shows that the three hybrid models outperformed single common models.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study...Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.展开更多
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana...Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.展开更多
Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, patte...Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an iterative fashion. Although k-means is considered to be a poor clustering algorithm in terms of result quality, due to its simplicity, speed on practical applications, and iterative nature it is selected as one of the top 10 algorithms in data mining [1]. Parallelization of k-means is also studied during the last 2 decades. Most of these work concentrate on shared-nothing architectures. With the advent of current technological advances on GPU technology, implementation of the k-means algorithm on shared memory architectures recently start to attract some attention. However, to the best of our knowledge, no in-depth analysis on the performance of k-means on shared memory multiprocessors is done in the literature. In this work, our aim is to fill this gap by providing theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.展开更多
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete th...The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete the redundant data. It can avoid scanning the database repeatedly and producing a large number of false rules. Secondly, the paper used clustering results to perform association rule mining. It can obtain valuable information and achieve the service of intelligent recommendation.展开更多
文摘Data mining is the powerful technique, which can be widely used for discovering the customers’ behaviors as well as customer’s preferences. As a result, it has been widely used in top level companies for evaluating their Customer Relationship Management (CRM) system today. In this study, a new K-means clustering method proposed to evaluate the cluster customers’ profitability in telecommunication industry in Sri Lanka. Furthermore, RFM model mainly used as an input variable for K-means clustering and distortion curve used to identify optimal number of initial clusters. Based on the results, telecommunication customers’ profitability in Sri Lanka mainly categorized into three levels.
基金Science and Technology Project of Guizhou Province of China(Grant QKHJC[2019]1403)and(Grant QKHJC[2019]1041)Guizhou Province Colleges and Universities Top Technology Talent Support Program(Grant QJHKY[2016]068).
文摘With the advent of the era of big data and the development and construction of smart campuses,the campus is gradually moving towards digitalization,networking and informationization.The campus card is an important part of the construction of a smart campus,and the massive data it generates can indirectly reflect the living conditions of students at school.In the face of the campus card,how to quickly and accurately obtain the information required by users from the massive data sets has become an urgent problem that needs to be solved.This paper proposes a data mining algorithm based on K-Means clustering and time series.It analyzes the consumption data of a college student’s card to deeply mine and analyze the daily life consumer behavior habits of students,and to make an accurate judgment on the specific life consumer behavior.The algorithm proposed in this paper provides a practical reference for the construction of smart campuses in universities,and has important theoretical and application values.
基金Beijing Science and Technology Program(No.Z191100006619065)National Key R&D Program(No.2017YFC1700101)。
文摘Objective:To explore the medication rule of Traditional Chinese Medicine(TCM)in the treatment of sleep disorder after stroke by using data mining technology.Methods:A computer search was used to search the electronic database of clinical literature on the treatment of sleep disorders after stroke by TCM from January 2000 to January 2021.Excel was used to establish the database,and the prescription information was described and analyzed statistically.Using IBM SPSS Modeler 18.0 software,Apriori algorithm was used for TCM association analysis,and IBM SPSS 22.0 software was used for systematic cluster analysis of high-frequency TCM.Results:A total of 67 literatures were included,covering 131 traditional Chinese medicines.The medecines with a higher frequency of sodium use include Ziziphi Spinosae Semen(Suanzaoren),Angelicae Sinensis Radix(Danggui),Ligusticum(Chuanxiong),liquorice(Gancao),Poria cocos(Fuling),and so on.From the effect point of view,deficiency-tonifying medicine,sedative medicine and blood-activating and stasis-removing medicine are commonly used.The medicinal properties are mainly cold,mild and warm.The main medicine flavor are sweet and bitter.The medicines mostly belong to the liver,heart and spleen Meridian.Thirty-three association rules were obtained for medicine pairs and medicine groups from the correlation analysis,and the core combinations were"Ziziphi Spinosae Semen(Suanzaoren)-Tuber fleeceflower stem(Yejiaoteng)","Ziziphi Spinosae Semen(Suanzaoren)-Polygala(Yuanzhi)","Ziziphi Spinosae Semen(Suanzaoren)-Cortex albiziae(Hehuanpi)"and"Angelicae Sinensis Radix(Danggui)-Radix bupleuri(Chaihu)-Radix Paeoniae Alba(Baishao)"and so on.Seven medicine aggregation groups were obtained by medicine cluster analysis.Conclusion:In the treatment of sleep disorder after stroke by TCM,the main method is to calm the heart and mind.Meanwhile,according to different syndrome types,the treatment methods of tonifying the heart and spleen,nourishing the liver and kidney,soothing the liver and softening the liver,clearing heat and resolving phlegm,nourishing the blood and promoting blood circulation are selected,which provide certain reference for clinical treatment.
基金This research work was funded by the Institutional Fund Projects under Grant No.(IFPIP:959-611-1443)The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University,DSR,Jeddah,Saudi Arabia.
文摘Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers aren’t getting the most out of their land because they don’t use precision agriculture.They harvest crops without a well-planned recommendation system.Future crop production is calculated by combining environmental conditions and management behavior,yielding numerical and categorical data.Most existing research still needs to address data preprocessing and crop categorization/classification.Furthermore,statistical analysis receives less attention,despite producing more accurate and valid results.The study was conducted on a dataset about Karnataka state,India,with crops of eight parameters taken into account,namely the minimum amount of fertilizers required,such as nitrogen,phosphorus,potassium,and pH values.The research considers rainfall,season,soil type,and temperature parameters to provide precise cultivation recommendations for high productivity.The presented algorithm converts discrete numerals to factors first,then reduces levels.Second,the algorithm generates six datasets,two fromCase-1(dataset withmany numeric variables),two from Case-2(dataset with many categorical variables),and one from Case-3(dataset with reduced factor variables).Finally,the algorithm outputs a class membership allocation based on an extended version of the K-means partitioning method with lambda estimation.The presented work produces mixed-type datasets with precisely categorized crops by organizing data based on environmental conditions,soil nutrients,and geo-location.Finally,the prepared dataset solves the classification problem,leading to a model evaluation that selects the best dataset for precise crop prediction.
文摘Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.
基金supported by Chongqing Municipal Health and Family Planning Commission and Chongqing Municipal Science and Technology Commission Jointly Funded Key Research Projects in Traditional Chinese Medicine(ZY201801007).
文摘Background:The purpose of this study was to identify the characteristics and principles of acupoints applied for treating chronic hepatitis B infection.Methods:The published clinical studies on acupuncture for the treatment of chronic hepatitis B infection were gathered from various databases,including SinoMed,Chongqing Vip,China National Knowledge Infrastructure,Wanfang,the Cochrane Library,PubMed,Web of Science and Embase.Excel 2019 was utilized to establish a database of acupuncture prescriptions and conduct statistics on the frequency,meridian application,distribution and specific points,as well as SPSS Modeler 18.0 and SPSS Statistics 26.0 to conduct association rule analysis and cluster analysis to investigate the characteristics and patterns of acupoint selection.Results:A total of 42 studies containing 47 acupoints were included,with a total frequency of 286 acupoints.The top five acupoints used were Zusanli(ST36),Ganshu(BL18),Yanglingquan(GB34),Sanyinjiao(SP6)and Taichong(LR3),and the most commonly used meridians was the Bladder Meridian of Foot-Taiyang.The majority of acupuncture points are located in the lower limbs,back,and lumbar regions,with a significant percentage of them being Five-Shu acupoints.The strongest acupoint combination identified was Zusanli(ST36)–Ganshu(BL18),in addition to which 13 association rules and 4 valid clusters were obtained.Conclusion:Zusanli(ST36)–Ganshu(BL18)could be considered a relatively reasonable prescription for treating chronic hepatitis B infection in clinical practice.However,further high-quality studies are needed.
文摘The academic community is currently confronting some challenges in terms of analyzing and evaluating the progress of a student’s academic performance. In the real world, classifying the performance of the students is a scientifically challenging task. Recently, some studies apply cluster analysis for evaluating the students’ results and utilize statistical techniques to part their score in regard to student’s performance. This approach, however, is not efficient. In this study, we combine two techniques, namely, k-mean and elbow clustering algorithm to evaluate the student’s performance. Based on this combination, the results of performance will be more accurate in analyzing and evaluating the progress of the student’s performance. In this study, the methodology has been implemented to define the diverse fascinating model taking the student test scores.
文摘Objective:Use data mining techniques to explore the rule of Chinese medicine used for airway remodeling.Methods:Search the literature on Chinese medicine use for airway remodeling in the past 20 years.With the help of WPS Office Excel 11.1,IBM SPSS Statistics 23.0 and SPSS Modeler 18.0 software,prescriptions were analyzed for the frequency of drug use,the four natures,the five flavours and the channel tropism,cluster analysis and association analysis of high-frequency drugs.Results:There were 58 Chinese medicine prescriptions for airway remodeling be found,involving 105 Chinese medicines,the most frequent channel tropism were spleen,stomach,lung,large intestine,liver and gallbladder,the most frequent use of the five flavors was sour,sweet and pungent,the highest frequency of the four natures was cold and hot,cluster analysis yielded eight drug aggregation groups,and association rule analysis yielded five groups of high-frequency drug pairs.Conclusion:The main TCM treatments for airway remodeling are expelling phlegm,relieving cough,asthma calming,expelling blood stasis and deficiency tonifying.The results of this study can provide ideas for compounding and drug selection for subsequent studies.
基金National Natural Science Foundations of China(No.71501103)Natural Science Foundation of Inner Mongolia,China(No.2015BS0705)the Program of Higher-Level Talents of Inner Mongolia University,China(No.20700-5145131)
文摘Complex repairable system is composed of thousands of components.Some maintenance management and decision problems in maintenance management and decision need to classify a set of components into several classes based on data mining.Furthermore,with the complexity of industrial equipment increasing,the managers should pay more attention to the key components and carry out the lean management is very important.Therefore,the idea"customer segmentation"of"precise marketing"can be used in the maintenance management of the multi-component system.Following the idea of segmentation,the components of multicomponent systems should be subdivied into groups based on specific attributes relevant to maintenance,such as maintenance cost,mean time between failures,and failure frequency.For the target specific groups of parts,the optimal maintenance policy,health assessment and maintenance scheduling can be determined.The proposed analysis framework will be given out.In order to illustrate the effectiveness of this method,a numerical example is given out.
文摘Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
文摘The term “customer churn” is used in the industry of information and communication technology (ICT) to indicate those customers who are about to leave for a new competitor, or end their subscription. Predicting this behavior is very important for real life market and competition, and it is essential to manage it. In this paper, three hybrid models are investigated to develop an accurate and efficient churn prediction model. The three models are based on two phases;the clustering phase and the prediction phase. In the first phase, customer data is filtered. The second phase predicts the customer behavior. The first model investigates the k-means algorithm for data filtering, and Multilayer Perceptron Artificial Neural Networks (MLP-ANN) for prediction. The second model uses hierarchical clustering with MLP-ANN. The third one uses self organizing maps (SOM) with MLP-ANN. The three models are developed based on real data then the accuracy and churn rate values are calculated and compared. The comparison with the other models shows that the three hybrid models outperformed single common models.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
文摘Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.
文摘Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.
文摘Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an iterative fashion. Although k-means is considered to be a poor clustering algorithm in terms of result quality, due to its simplicity, speed on practical applications, and iterative nature it is selected as one of the top 10 algorithms in data mining [1]. Parallelization of k-means is also studied during the last 2 decades. Most of these work concentrate on shared-nothing architectures. With the advent of current technological advances on GPU technology, implementation of the k-means algorithm on shared memory architectures recently start to attract some attention. However, to the best of our knowledge, no in-depth analysis on the performance of k-means on shared memory multiprocessors is done in the literature. In this work, our aim is to fill this gap by providing theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.
文摘The traditional library can’t provide the service of personalized recommendation for users. This paper used Clementine to solve this problem. Firstly, model of K-means clustering analyze the initial data to delete the redundant data. It can avoid scanning the database repeatedly and producing a large number of false rules. Secondly, the paper used clustering results to perform association rule mining. It can obtain valuable information and achieve the service of intelligent recommendation.