Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize cl...Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.展开更多
This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and ...This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and outliers in time series.Then,the constructed fuzzy inference system(FIS) is optimized with a partition refining strategy to balance the system's accuracy and complexity.The proposed algorithm is compared with the WangMendel(WM) method,a benchmark method for building FIS,in comprehensive analysis of robustness.In the classical Mackey-Glass time series forecasting,the simulation results prove that the proposed method is able to predict time series with random perturbation more accurately.For the practical application,the proposed FIS is applied to predicting the time series of ship maneuvering motion.To obtain actual time series data records,the ship maneuvering motion trial is conducted in the Yukun ship of Dalian Maritime University in China.The time series forecasting results show that the FIS constructed with DM concepts can forecast ship maneuvering motion robustly and effectively.展开更多
The validity measurement of fuzzy clustering is a key problem. If clustering is formed, it needs a kind of machine to verify its validity. To make mining more accountable, comprehensible and with a usable spatial patt...The validity measurement of fuzzy clustering is a key problem. If clustering is formed, it needs a kind of machine to verify its validity. To make mining more accountable, comprehensible and with a usable spatial pattern, it is necessary to first detect whether the data set has a clustered structure or not before clustering. This paper discusses a detection method for clustered patterns and a fuzzy clustering algorithm, and studies the validity function of the result produced by fuzzy clustering based on two aspects, which reflect the un-certainty of classification during fuzzy partition and spatial location features of spatial data, and proposes a new validity function of fuzzy clustering for spatial data. The experimental result indicates that the new validity function can accurately measure the validity of the results of fuzzy clustering. Especially, for the result of fuzzy clustering of spatial data, it is robust and its classification result is better when compared to other indices.展开更多
The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advan...The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.展开更多
A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the...A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.展开更多
We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can ...We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.展开更多
Unsupervised clustering and clustering validity are used as essential instruments of data analytics.Despite clustering being realized under uncertainty,validity indices do not deliver any quantitative evaluation of th...Unsupervised clustering and clustering validity are used as essential instruments of data analytics.Despite clustering being realized under uncertainty,validity indices do not deliver any quantitative evaluation of the uncertainties in the suggested partitionings.Also,validity measures may be biased towards the underlying clustering method.Moreover,neglecting a confidence requirement may result in over-partitioning.In the absence of an error estimate or a confidence parameter,probable clustering errors are forwarded to the later stages of the system.Whereas,having an uncertainty margin of the projected labeling can be very fruitful for many applications such as machine learning.Herein,the validity issue was approached through estimation of the uncertainty and a novel low complexity index proposed for fuzzy clustering.It involves only uni-dimensional membership weights,regardless of the data dimension,stipulates no specific distribution,and is independent of the underlying similarity measure.Inclusive tests and comparisons returned that it can reliably estimate the optimum number of partitions under different data distributions,besides behaving more robust to over partitioning.Also,in the comparative correlation analysis between true clustering error rates and some known internal validity indices,the suggested index exhibited the highest strong correlations.This relationship has been also proven stable through additional statistical acceptance tests.Thus the provided relative uncertainty measure can be used as a probable error estimate in the clustering as well.Besides,it is the only method known that can exclusively identify data points in dubiety and is adjustable according to the required confidence level.展开更多
Fuzzy C-means (FCM) is simple and widely used for complex data pattern recognition and image analyses. However, selecting an appropriate fuzzifier (m) is crucial in identifying an optimal number of patterns and achiev...Fuzzy C-means (FCM) is simple and widely used for complex data pattern recognition and image analyses. However, selecting an appropriate fuzzifier (m) is crucial in identifying an optimal number of patterns and achieving higher clustering accuracy, which few studies have investigated. Built upon two existing methods on selecting fuzzifier, we developed an integrated fuzzifier evaluation and selection algorithm and tested it using real datasets. Our findings indicate that the consistent optimal number of clusters can be learnt from testing different fuzzifiers for each dataset and the fuzzifier with the lowest value for this consistency should be selected for clustering. Our evaluation also shows that the fuzzifier impacts the clustering accuracy. For longitudinal data with missing values, m = 2 could be an empirical rule to start fuzzy clustering, and the best clustering accuracy was achieved for tested data, especially using our multiple-imputation based fuzzy clustering.展开更多
This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable form...This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable forms. In the NFGDM, input data arepreprocesscd byfuzzification, the preprocessed data of input variables arc then used to train a radial basisprobabilistic neural network to classify the dataset according to the classes considered, A ruleextraction technique is then applied in order to extract explicit knowledge from the trained neuralnetworks and represent it m the form of fuzzy if-then rules. In the final stage, genetic algorithmis used as a rule-pruning module to eliminate those weak rules that are still in the rule bases.Comparison with some known neural network classifier, the architecture has fast learning speed, andit is characterized by the incorporation of the possibility information into the consequents ofclassification rules in human understandable forms. The experiments show that the NFGDM is moreefficient and more robust than traditional decision tree method.展开更多
Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.T...Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.展开更多
Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study...Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.展开更多
With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow e...With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance.展开更多
Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stag...Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional...This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional algorithm: that is, the classification accura- cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.展开更多
Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clu...Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.展开更多
Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can...Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.展开更多
Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some ...Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.展开更多
基金This research is funded by Graduate University of Science and Technology under grant number GUST.STS.DT2020-TT01。
文摘Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.
基金the Fundamental Research Funds for the Central Universities,China(No.01750307)the Doctoral Scientific Research Foundation of Liaoning Province,China(No.201501188)
文摘This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and outliers in time series.Then,the constructed fuzzy inference system(FIS) is optimized with a partition refining strategy to balance the system's accuracy and complexity.The proposed algorithm is compared with the WangMendel(WM) method,a benchmark method for building FIS,in comprehensive analysis of robustness.In the classical Mackey-Glass time series forecasting,the simulation results prove that the proposed method is able to predict time series with random perturbation more accurately.For the practical application,the proposed FIS is applied to predicting the time series of ship maneuvering motion.To obtain actual time series data records,the ship maneuvering motion trial is conducted in the Yukun ship of Dalian Maritime University in China.The time series forecasting results show that the FIS constructed with DM concepts can forecast ship maneuvering motion robustly and effectively.
文摘The validity measurement of fuzzy clustering is a key problem. If clustering is formed, it needs a kind of machine to verify its validity. To make mining more accountable, comprehensible and with a usable spatial pattern, it is necessary to first detect whether the data set has a clustered structure or not before clustering. This paper discusses a detection method for clustered patterns and a fuzzy clustering algorithm, and studies the validity function of the result produced by fuzzy clustering based on two aspects, which reflect the un-certainty of classification during fuzzy partition and spatial location features of spatial data, and proposes a new validity function of fuzzy clustering for spatial data. The experimental result indicates that the new validity function can accurately measure the validity of the results of fuzzy clustering. Especially, for the result of fuzzy clustering of spatial data, it is robust and its classification result is better when compared to other indices.
文摘The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.
文摘A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.
文摘We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.
文摘Unsupervised clustering and clustering validity are used as essential instruments of data analytics.Despite clustering being realized under uncertainty,validity indices do not deliver any quantitative evaluation of the uncertainties in the suggested partitionings.Also,validity measures may be biased towards the underlying clustering method.Moreover,neglecting a confidence requirement may result in over-partitioning.In the absence of an error estimate or a confidence parameter,probable clustering errors are forwarded to the later stages of the system.Whereas,having an uncertainty margin of the projected labeling can be very fruitful for many applications such as machine learning.Herein,the validity issue was approached through estimation of the uncertainty and a novel low complexity index proposed for fuzzy clustering.It involves only uni-dimensional membership weights,regardless of the data dimension,stipulates no specific distribution,and is independent of the underlying similarity measure.Inclusive tests and comparisons returned that it can reliably estimate the optimum number of partitions under different data distributions,besides behaving more robust to over partitioning.Also,in the comparative correlation analysis between true clustering error rates and some known internal validity indices,the suggested index exhibited the highest strong correlations.This relationship has been also proven stable through additional statistical acceptance tests.Thus the provided relative uncertainty measure can be used as a probable error estimate in the clustering as well.Besides,it is the only method known that can exclusively identify data points in dubiety and is adjustable according to the required confidence level.
文摘Fuzzy C-means (FCM) is simple and widely used for complex data pattern recognition and image analyses. However, selecting an appropriate fuzzifier (m) is crucial in identifying an optimal number of patterns and achieving higher clustering accuracy, which few studies have investigated. Built upon two existing methods on selecting fuzzifier, we developed an integrated fuzzifier evaluation and selection algorithm and tested it using real datasets. Our findings indicate that the consistent optimal number of clusters can be learnt from testing different fuzzifiers for each dataset and the fuzzifier with the lowest value for this consistency should be selected for clustering. Our evaluation also shows that the fuzzifier impacts the clustering accuracy. For longitudinal data with missing values, m = 2 could be an empirical rule to start fuzzy clustering, and the best clustering accuracy was achieved for tested data, especially using our multiple-imputation based fuzzy clustering.
基金Supported by the National Research Foundation for the Doctoral Program of Higher Education of China (20030487032)
文摘This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable forms. In the NFGDM, input data arepreprocesscd byfuzzification, the preprocessed data of input variables arc then used to train a radial basisprobabilistic neural network to classify the dataset according to the classes considered, A ruleextraction technique is then applied in order to extract explicit knowledge from the trained neuralnetworks and represent it m the form of fuzzy if-then rules. In the final stage, genetic algorithmis used as a rule-pruning module to eliminate those weak rules that are still in the rule bases.Comparison with some known neural network classifier, the architecture has fast learning speed, andit is characterized by the incorporation of the possibility information into the consequents ofclassification rules in human understandable forms. The experiments show that the NFGDM is moreefficient and more robust than traditional decision tree method.
基金Projects(61001188,1161140319)supported by the National Natural Science Foundation of ChinaProject(2012ZX03001034)supported by the National Science and Technology Major ProjectProject(YETP1202)supported by Beijing Higher Education Young Elite Teacher Project,China
文摘Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
文摘Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.
文摘With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R113)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Encephalitis is a brain inflammation disease.Encephalitis can yield to seizures,motor disability,or some loss of vision or hearing.Sometimes,encepha-litis can be a life-threatening and proper diagnosis in an early stage is very crucial.Therefore,in this paper,we are proposing a deep learning model for computerized detection of Encephalitis from the electroencephalogram data(EEG).Also,we propose a Density-Based Clustering model to classify the distinctive waves of Encephalitis.Customary clustering models usually employ a computed single centroid virtual point to define the cluster configuration,but this single point does not contain adequate information.To precisely extract accurate inner structural data,a multiple centroids approach is employed and defined in this paper,which defines the cluster configuration by allocating weights to each state in the cluster.The multiple EEG view fuzzy learning approach incorporates data from every sin-gle view to enhance the model's clustering performance.Also a fuzzy Density-Based Clustering model with multiple centroids(FDBC)is presented.This model employs multiple real state centroids to define clusters using Partitioning Around Centroids algorithm.The Experimental results validate the medical importance of the proposed clustering model.
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
文摘This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional algorithm: that is, the classification accura- cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.
文摘Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.
文摘Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.
文摘Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.