The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people an...The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people and the environment in which we live.Such datasets are often high dimensional and heterogeneous in nature,with complex geospatial relations.Analysing such data can be challenging,especially when there is a need to maintain spatial awareness as the non-spatial attributes are studied.Geo-Coordinated Parallel Coordinates(GCPC)is a geovisual analytics approach designed to support exploration and analysis within complex geospatial environmental data.Parallel coordinates are tightly coupled with a geospatial representation and an investigative scatterplot,all of which can be used to show,reorganize,filter,and highlight the high dimensional,heterogeneous,and geospatial aspects of the data.Two sets of field trials were conducted with expert data analysts to validate the real-world benefits of the approach for studying environmental data.The results of these evaluations were positive,providing real-world evidence and new insights regarding the value of using GCPC to explore among environmental datasets when there is a need to remain aware of the geospatial aspects of the data as the non-spatial elements are studied.展开更多
By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or a...By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes.展开更多
Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the ...Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.展开更多
基金This work was supported in part by grant from Social Sciences and Humanities Research Council of Canada(SSHRC)(895-2011-1011)held by the second author.
文摘The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people and the environment in which we live.Such datasets are often high dimensional and heterogeneous in nature,with complex geospatial relations.Analysing such data can be challenging,especially when there is a need to maintain spatial awareness as the non-spatial attributes are studied.Geo-Coordinated Parallel Coordinates(GCPC)is a geovisual analytics approach designed to support exploration and analysis within complex geospatial environmental data.Parallel coordinates are tightly coupled with a geospatial representation and an investigative scatterplot,all of which can be used to show,reorganize,filter,and highlight the high dimensional,heterogeneous,and geospatial aspects of the data.Two sets of field trials were conducted with expert data analysts to validate the real-world benefits of the approach for studying environmental data.The results of these evaluations were positive,providing real-world evidence and new insights regarding the value of using GCPC to explore among environmental datasets when there is a need to remain aware of the geospatial aspects of the data as the non-spatial elements are studied.
文摘By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes.
基金This work was supported by the National Basic Research Program of China(No.2007CB307100)the National Natural Science Foundation of China(Grant No.60432010).
文摘Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.