Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tab...Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data,and they fail to achieve the best balance between accuracy and eficiency.In this paper,we present a novel visual analysis approach for data imputation.We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables.Then,we perform the initial imputation of incomplete data using correlated data entries from other tables.Additionally,we develop a visual analysis system to refine data imputation candidates.Our interactive system combines the multi-party data imputation approach with expert knowledge,allowing for a better understanding of the relational structure of the data.This significantly enhances the accuracy and eficiency of data imputation,thereby enhancing the quality of data governance and the intrinsic value of data assets.Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using theirdomain knowledge.展开更多
Visualizing intrinsic structures of high-dimensional data is an essential task in data analysis.Over the past decades,a large number of methods have been proposed.Among all solutions,one promising way for enabling eff...Visualizing intrinsic structures of high-dimensional data is an essential task in data analysis.Over the past decades,a large number of methods have been proposed.Among all solutions,one promising way for enabling effective visual exploration is to construct a k-nearest neighbor(KNN)graph and visualize the graph in a low-dimensional space.Yet,state-of-the-art methods such as the LargeVis still suffer from two main problems when applied to large-scale data:(1)they may produce unappealing visualizations due to the non-convexity of the cost function;(2)visualizing the KNN graph is still time-consuming.In this work,we propose a novel visualization algorithm that leverages a multilevel representation to achieve a high-quality graph layout and employs a cluster-based approximation scheme to accelerate the KNN graph layout.Experiments on various large-scale datasets indicate that our approach achieves a speedup by a factor of five for KNN graph visualization compared to LargeVis and yields aesthetically pleasing visualization results.展开更多
Authoring graph visualization poses great challenges to developers due to its high requirements on both domain knowledge and development skills.Although existing libraries and tools reduce the difficulty of generating...Authoring graph visualization poses great challenges to developers due to its high requirements on both domain knowledge and development skills.Although existing libraries and tools reduce the difficulty of generating graph visualization,there are still many challenges.We work closely with developers and formulate several design goals,then design and implement G6,a web-based library for graph visualization.It combines template-based configuration for high usability and flexible customization for high expressiveness.To enhance development efficiency,G6 proposes a range of optimizations,including state management and interaction modes.We demonstrate its capabilities through an extensive gallery,a quantitative performance evaluation,and an expert interview.G6 was first released in 2017 and has been iterated for 317 versions.It has served as a web-based library for thousands of applications and received 8312 stars on GitHub.展开更多
基金Project supported by the Key R&D"Pioneer"Tackling Plan Program of Zhejiang Province,China(No.2023C01119)the"Ten Thousand Talents Plan"Science and Technology Innovation Leading Talent Program of Zhejiang Province,China(No.2022R52044)+1 种基金the Major Standardization Pilot Projects for the Digital Economy(Digital Trade Sector)of Zhejiang Province,China(No.SJ-Bz/2023053)the National Natural Science Foundationof China(No.62132017)。
文摘Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data,and they fail to achieve the best balance between accuracy and eficiency.In this paper,we present a novel visual analysis approach for data imputation.We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables.Then,we perform the initial imputation of incomplete data using correlated data entries from other tables.Additionally,we develop a visual analysis system to refine data imputation candidates.Our interactive system combines the multi-party data imputation approach with expert knowledge,allowing for a better understanding of the relational structure of the data.This significantly enhances the accuracy and eficiency of data imputation,thereby enhancing the quality of data governance and the intrinsic value of data assets.Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using theirdomain knowledge.
文摘Visualizing intrinsic structures of high-dimensional data is an essential task in data analysis.Over the past decades,a large number of methods have been proposed.Among all solutions,one promising way for enabling effective visual exploration is to construct a k-nearest neighbor(KNN)graph and visualize the graph in a low-dimensional space.Yet,state-of-the-art methods such as the LargeVis still suffer from two main problems when applied to large-scale data:(1)they may produce unappealing visualizations due to the non-convexity of the cost function;(2)visualizing the KNN graph is still time-consuming.In this work,we propose a novel visualization algorithm that leverages a multilevel representation to achieve a high-quality graph layout and employs a cluster-based approximation scheme to accelerate the KNN graph layout.Experiments on various large-scale datasets indicate that our approach achieves a speedup by a factor of five for KNN graph visualization compared to LargeVis and yields aesthetically pleasing visualization results.
基金supported by National Natural Science Foundation of China(61772456).
文摘Authoring graph visualization poses great challenges to developers due to its high requirements on both domain knowledge and development skills.Although existing libraries and tools reduce the difficulty of generating graph visualization,there are still many challenges.We work closely with developers and formulate several design goals,then design and implement G6,a web-based library for graph visualization.It combines template-based configuration for high usability and flexible customization for high expressiveness.To enhance development efficiency,G6 proposes a range of optimizations,including state management and interaction modes.We demonstrate its capabilities through an extensive gallery,a quantitative performance evaluation,and an expert interview.G6 was first released in 2017 and has been iterated for 317 versions.It has served as a web-based library for thousands of applications and received 8312 stars on GitHub.