In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different ...In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different attributes and multi-label sets using information gain,which can be regarded as the important degree of each attribute in the attribute learning method,but also further analyzes the intra-coupled and inter-coupled interactions between an attribute value pair for different attributes and multiple labels.The paper compared the CASonMLCD method with the OF distance and Jaccard similarity,which is based on the MLKNN algorithm according to 5common evaluation criteria.The experiment results demonstrated that the CASonMLCD method can mine the similarity relationship more accurately and comprehensively,it can obtain better performance than compared methods.展开更多
In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify pat...In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.展开更多
On the basis of extension architectonics,this paper researches the process of extension categorical data mining for extension interior design. In accordance with the theory of extension data mining,the extension categ...On the basis of extension architectonics,this paper researches the process of extension categorical data mining for extension interior design. In accordance with the theory of extension data mining,the extension categorical data mining for the extension interior design can be divided into data preparation,the operation of mining and knowledge application. The paper expatiates the main content and cohesive relations of each link,and emphatically discusses extension acquisition,analysis extension,categorical mining extension,knowledge application extension and other several core nodes that are related with data. Through the knowledge fusion of extension architectonics and data mining,the paper discusses the process of knowledge requirements with multiple classification under different mining targets. The purpose of this paper is to explore a whole categorical data mining process of interior design from extension design data to the design of knowledge discovery and extension application.展开更多
Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from th...Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from the viewpoint of cluster ensemble, and apply cluster ensemble approach for clustering categorical data. Experimental results on real datasets show that better clustering accuracy can be obtained by comparing with existing categorical data clustering algorithms.展开更多
Statistics is a powerful tool for data measurement. Statistical techniques properly planned and executed give meaning to meaningless data. The difficulty some practitioners encounter hinges on the fact that though the...Statistics is a powerful tool for data measurement. Statistical techniques properly planned and executed give meaning to meaningless data. The difficulty some practitioners encounter hinges on the fact that though there are numerous statistical methods available for use in analysis, the extent of their understanding and ease of using these tools for analysis is limited. This study has twofold purpose: firstly, literature on categorical data commonly used in research w</span><span style="font-family:Verdana;">as</span><span style="font-family:Verdana;"> reviewed</span><span style="font-family:Verdana;">;</span><span style="font-family:""><span style="font-family:Verdana;"> next, we reported the results of a survey we designed and executed. Categorical data was collected via questionnaire and analyzed to serve as a backbone of the robustness of categorical data. Several conjec</span><span style="font-family:Verdana;">tures about the independence of the socio-economic variables and e-commence</span><span style="font-family:Verdana;"> were tested. Some of the factors influencing patronage of e-commerce were </span><span style="font-family:Verdana;">identified. It is clear from the literature that as one’s academic qualification</span><span style="font-family:Verdana;"> improves</span></span><span style="font-family:Verdana;">, </span><span style="font-family:""><span style="font-family:Verdana;">there is an associated improvement in their preference for e-commerce, but the results revealed otherwise. Size of family was found to influence e-commerce. Both income and social status positively affected pa</span><span style="font-family:Verdana;">tronage in e-commerce. Gender also appeared to affect patronage in e-commerce</span><span style="font-family:Verdana;">. 62.3% of staff had patronized e-commerce</span></span><span style="font-family:Verdana;">.</span><span style="font-family:Verdana;"> This shows that e-commerce patronage was gradually increasing. It is therefore our considered view that policy documents regulating and monitoring the use of e-commerce be developed to increase e-commerce participation across the globe</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">It is also recommended that the bottlenecks which obstruct patronage in e-commence be addressed so that a lot more staff will develop a positive attitude towards e-commerce.展开更多
Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes...Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes for this task.However,a predefined color mapping is not always optimal,failing to consider users’needs for customization.Given an input cate-gorical data visualization and a reference image,we present an effective method to automatically generate a coloring that resembles the reference while allowing classes to be easily distinguished.We extract a color palette with high perceptual distance between the colors by sampling dominant and discriminable colors from the image’s color space.These colors are assigned to given classes by solving an integer quadratic program to optimize point distinctness of the given chart while preserving the color spatial relations in the source image.We show results on various coloring tasks,with a diverse set of new coloring appearances for the input data.We also compare our approach to state-of-the-art palettes in a controlled user study,which shows that our method achieves comparable performance in class discrimination,while being more similar to the source image.User feedback after using our system verifies its efficiency in automatically generating desirable colorings that meet the user’s expectations when choosing a reference.展开更多
Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications fr...Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications frequently involve many datasets that also consists ofmixed numeric and categorical attributes. In this paper we present a clustering algorithm which isbased on the k-means algorithm. The algorithm clusters objects with numeric and categoricalattributes in a way similar to k-means. The object similarity measure is derived from both numericand categorical attributes. When applied to numeric data, the algorithm is identical to the k-means.The main result of this paper is to provide a method to update the 'cluster centers' of clusteringobjects described by mixed numeric and categorical attributes in the clustering process to minimizethe clustering cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone databases.展开更多
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c...The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.展开更多
Ischemic heart disease(IHD)is one of the leading causes of death worldwide.However,different geographic regions show different variations of the risk factors of this disease based on the different lifestyles of people...Ischemic heart disease(IHD)is one of the leading causes of death worldwide.However,different geographic regions show different variations of the risk factors of this disease based on the different lifestyles of people.This study examines the current IHD condition in southern Bangladesh,a Southeast Asian middle-income country.The main approach to this research is an Al-based proposal of a reduced set of the greatest impact clinical traits that may cause IHD.This approach attempts to reduce IHD morbidity and mortality by early detection of risk factors using the reduced set of clinical data.Demographic,diagnostic,and symptomatic features were considered for analysing this clinical data.Data pre-processing utilizes several machine learning techniques to select significant features and make meaningful interpretations.A proposed voting mechanism ranked the selected 138 features by their impact factor.In this regard,diverse patterns in correlations with variables,including age,sex,career,family history,obesity,etc.,were calculated and explained in terms of voting scores.Among the 138 risk factors,three labels were categorized:high-risk,medium-risk,and low-risk features;19 features were regarded as high,25 were medium,and 94 were considered low impactful features.This research's technological methodology and practical goals provide an innovative and resilient framework for addressing IHD,especially in less developed cities and townships of Bangladesh,where the general population's socioeconomic conditions are often unexpected.The data collection,pre-processing,and use of this study's complete and comprehensive IHD patient dataset is another innovative addition.We believe that other relevant research initiatives will benefit from this work.展开更多
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different attributes and multi-label sets using information gain,which can be regarded as the important degree of each attribute in the attribute learning method,but also further analyzes the intra-coupled and inter-coupled interactions between an attribute value pair for different attributes and multiple labels.The paper compared the CASonMLCD method with the OF distance and Jaccard similarity,which is based on the MLKNN algorithm according to 5common evaluation criteria.The experiment results demonstrated that the CASonMLCD method can mine the similarity relationship more accurately and comprehensively,it can obtain better performance than compared methods.
基金Supported by the National Natural Science Foundation of China (601133010)
文摘In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.
基金Sponsored by the National Natural Science Foundation of China(Grant No.51178132)"Thirteenth Five-year" Social Science Research Project of the Education Department in Jilin Province(Grant No.Ji UNESCO co word[2016]No.382th)
文摘On the basis of extension architectonics,this paper researches the process of extension categorical data mining for extension interior design. In accordance with the theory of extension data mining,the extension categorical data mining for the extension interior design can be divided into data preparation,the operation of mining and knowledge application. The paper expatiates the main content and cohesive relations of each link,and emphatically discusses extension acquisition,analysis extension,categorical mining extension,knowledge application extension and other several core nodes that are related with data. Through the knowledge fusion of extension architectonics and data mining,the paper discusses the process of knowledge requirements with multiple classification under different mining targets. The purpose of this paper is to explore a whole categorical data mining process of interior design from extension design data to the design of knowledge discovery and extension application.
文摘Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from the viewpoint of cluster ensemble, and apply cluster ensemble approach for clustering categorical data. Experimental results on real datasets show that better clustering accuracy can be obtained by comparing with existing categorical data clustering algorithms.
文摘Statistics is a powerful tool for data measurement. Statistical techniques properly planned and executed give meaning to meaningless data. The difficulty some practitioners encounter hinges on the fact that though there are numerous statistical methods available for use in analysis, the extent of their understanding and ease of using these tools for analysis is limited. This study has twofold purpose: firstly, literature on categorical data commonly used in research w</span><span style="font-family:Verdana;">as</span><span style="font-family:Verdana;"> reviewed</span><span style="font-family:Verdana;">;</span><span style="font-family:""><span style="font-family:Verdana;"> next, we reported the results of a survey we designed and executed. Categorical data was collected via questionnaire and analyzed to serve as a backbone of the robustness of categorical data. Several conjec</span><span style="font-family:Verdana;">tures about the independence of the socio-economic variables and e-commence</span><span style="font-family:Verdana;"> were tested. Some of the factors influencing patronage of e-commerce were </span><span style="font-family:Verdana;">identified. It is clear from the literature that as one’s academic qualification</span><span style="font-family:Verdana;"> improves</span></span><span style="font-family:Verdana;">, </span><span style="font-family:""><span style="font-family:Verdana;">there is an associated improvement in their preference for e-commerce, but the results revealed otherwise. Size of family was found to influence e-commerce. Both income and social status positively affected pa</span><span style="font-family:Verdana;">tronage in e-commerce. Gender also appeared to affect patronage in e-commerce</span><span style="font-family:Verdana;">. 62.3% of staff had patronized e-commerce</span></span><span style="font-family:Verdana;">.</span><span style="font-family:Verdana;"> This shows that e-commerce patronage was gradually increasing. It is therefore our considered view that policy documents regulating and monitoring the use of e-commerce be developed to increase e-commerce participation across the globe</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">It is also recommended that the bottlenecks which obstruct patronage in e-commence be addressed so that a lot more staff will develop a positive attitude towards e-commerce.
基金supported in parts by National Natural Science Foundation of China(U2001206,61872250)GD Talent Program(2019JC05X328)+2 种基金GD Natural Science Foundation(2020A0505100064,2021B1515020085)DEGP Key Project(2018KZDXM058)Shenzhen Science and Technology Key Program(RCJC20200714114435012,JCYJ20210324120213036).
文摘Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes for this task.However,a predefined color mapping is not always optimal,failing to consider users’needs for customization.Given an input cate-gorical data visualization and a reference image,we present an effective method to automatically generate a coloring that resembles the reference while allowing classes to be easily distinguished.We extract a color palette with high perceptual distance between the colors by sampling dominant and discriminable colors from the image’s color space.These colors are assigned to given classes by solving an integer quadratic program to optimize point distinctness of the given chart while preserving the color spatial relations in the source image.We show results on various coloring tasks,with a diverse set of new coloring appearances for the input data.We also compare our approach to state-of-the-art palettes in a controlled user study,which shows that our method achieves comparable performance in class discrimination,while being more similar to the source image.User feedback after using our system verifies its efficiency in automatically generating desirable colorings that meet the user’s expectations when choosing a reference.
文摘Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications frequently involve many datasets that also consists ofmixed numeric and categorical attributes. In this paper we present a clustering algorithm which isbased on the k-means algorithm. The algorithm clusters objects with numeric and categoricalattributes in a way similar to k-means. The object similarity measure is derived from both numericand categorical attributes. When applied to numeric data, the algorithm is identical to the k-means.The main result of this paper is to provide a method to update the 'cluster centers' of clusteringobjects described by mixed numeric and categorical attributes in the clustering process to minimizethe clustering cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone databases.
基金Projects(60873265,60903222) supported by the National Natural Science Foundation of China Project(IRT0661) supported by the Program for Changjiang Scholars and Innovative Research Team in University of China
文摘The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.
文摘Ischemic heart disease(IHD)is one of the leading causes of death worldwide.However,different geographic regions show different variations of the risk factors of this disease based on the different lifestyles of people.This study examines the current IHD condition in southern Bangladesh,a Southeast Asian middle-income country.The main approach to this research is an Al-based proposal of a reduced set of the greatest impact clinical traits that may cause IHD.This approach attempts to reduce IHD morbidity and mortality by early detection of risk factors using the reduced set of clinical data.Demographic,diagnostic,and symptomatic features were considered for analysing this clinical data.Data pre-processing utilizes several machine learning techniques to select significant features and make meaningful interpretations.A proposed voting mechanism ranked the selected 138 features by their impact factor.In this regard,diverse patterns in correlations with variables,including age,sex,career,family history,obesity,etc.,were calculated and explained in terms of voting scores.Among the 138 risk factors,three labels were categorized:high-risk,medium-risk,and low-risk features;19 features were regarded as high,25 were medium,and 94 were considered low impactful features.This research's technological methodology and practical goals provide an innovative and resilient framework for addressing IHD,especially in less developed cities and townships of Bangladesh,where the general population's socioeconomic conditions are often unexpected.The data collection,pre-processing,and use of this study's complete and comprehensive IHD patient dataset is another innovative addition.We believe that other relevant research initiatives will benefit from this work.