In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect ...In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect how hidden states store and process information throughout the feeding of an input sequence into the network.The technique can help answer questions,such as which parts of the input data have a higher impact on the prediction and how the model correlates each hidden state configuration with a certain output.Our visual analytics approach comprises several components:First,our input visualization shows the input sequence and how it relates to the output(using color coding).In addition,hidden states are visualized through a nonlinear projection into a 2-D visualization space using t-distributed stochastic neighbor embedding to understand the shape of the space of the hidden states.Trajectories are also employed to show the details of the evolution of the hidden state configurations.Finally,a time-multi-class heatmap matrix visualizes the evolution of the expected predictions for multi-class classifiers,and a histogram indicates the distances between the hidden states within the original space.The different visualizations are shown simultaneously in multiple views and support brushing-and-linking to facilitate the analysis of the classifications and debugging for misclassified input sequences.To demonstrate the capability of our approach,we discuss two typical use cases for long short-term memory models applied to two widely used natural language processing datasets.展开更多
Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models.Urban visual analytics has already achieved remark...Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models.Urban visual analytics has already achieved remarkable success in tackling urban problems and providing fundamental services for smart cities.To promote further academic research and assist the development of industrial urban analytics systems,we comprehensively review urban visual analytics studies from four perspectives.In particular,we identify 8 urban domains and 22 types of popular visualization,analyze 7 types of computational method,and categorize existing systems into 4 types based on their integration of visualization techniques and computational models.We conclude with potential research directions and opportunities.展开更多
Although traditional Chinese medicine(TCM)and modern medicine(MM)have considerably different treatment philosophies,they both make important contributions to human health care.TCM physicians usually treat diseases usi...Although traditional Chinese medicine(TCM)and modern medicine(MM)have considerably different treatment philosophies,they both make important contributions to human health care.TCM physicians usually treat diseases using TCM formula(TCMF),which is a combination of specific herbs,based on the holistic philosophy of TCM,whereas MM physicians treat diseases using chemical drugs that interact with specific biological molecules.The difference between the holistic view of TCM and the atomistic view of MM hinders their combination.Tools that are able to bridge together TCM and MM are essential for promoting the combination of these disciplines.In this paper,we present TCMFVis,a visual analytics system that would help domain experts explore the potential use of TCMFs in MM at the molecular level.TCMFVis deals with two significant challenges,namely,(i)intuitively obtaining valuable insights from heterogeneous data involved in TCMFs and(ii)efficiently identifying the common features among a cluster of TCMFs.In this study,a four-level(herb-ingredient-targetdisease)visual analytics framework was designed to facilitate the analysis of heterogeneous data in a proper workflow.Several set visualization techniques were first introduced into the system to facilitate the identification of common features among TCMFs.Case studies on two groups of TCMFs clustered by function were conducted by domain experts to evaluate TCMFVis.The results of these case studies demonstrate the usability and scalability of the system.展开更多
Digital phenotyping is the characterization of human behavior patterns based on data from digital devices such as smartphones in order to gain insights into the users’state and especially to identify ailments.To supp...Digital phenotyping is the characterization of human behavior patterns based on data from digital devices such as smartphones in order to gain insights into the users’state and especially to identify ailments.To support supervised machine learning,digital phenotyping requires gathering data from study participants’smartphones as they live their lives.Periodically,participants are then asked to provide ground truth labels about their health status.Analyzing such complex data is challenging due to limited contextual information and imperfect health/wellness labels.We propose INteractive PHOne-o-typing VISualization(INPHOVIS),an interactive visual framework for exploratory analysis of smartphone health data to study phone-o-types.Prior visualization work has focused on mobile health data with clear semantics such as steps or heart rate data collected using dedicated health devices and wearables such as smartwatches.However,unlike smartphones which are owned by over 85 percent of the US population,wearable devices are less prevalent thus reducing the number of people from whom such data can be collected.In contrast,the‘‘low-level"sensor data(e.g.,accelerometer or GPS data)supported by INPHOVIS can be easily collected using smartphones.Data visualizations are designed to provide the essential contextualization of such data and thus help analysts discover complex relationships between observed sensor values and health-predictive phone-o-types.To guide the design of INPHOVIS,we performed a hierarchical task analysis of phone-o-typing requirements with health domain experts.We then designed and implemented multiple innovative visualizations integral to INPHOVIS including stacked bar charts to show diurnal behavioral patterns,calendar views to visualize day-level data along with bar charts,and correlation views to visualize important wellness predictive data.We demonstrate the usefulness of INPHOVIS with walk-throughs of use cases.We also evaluated INPHOVIS with expert feedback and received encouraging responses.展开更多
A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future pred...A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem snurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the pre- diction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual ana- lytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summariza- tion of the predictive analytics workfiow.展开更多
Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization.To better identify which research topics are promising and to learn how to apply relevant tech...Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization.To better identify which research topics are promising and to learn how to apply relevant techniques in visual analytics,we systematically review259 papers published in the last ten years together with representative works before 2010.We build a taxonomy,which includes three first-level categories:techniques before model building,techniques during modeling building,and techniques after model building.Each category is further characterized by representative analysis tasks,and each task is exemplified by a set of recent influential works.We also discuss and highlight research challenges and promising potential future research opportunities useful for visual analytics researchers.展开更多
Visual analytics employs interactive visualizations to integrate users' knowledge and inference capability into numerical/algorithmic data analysis processes. It is an active research field that has applications in m...Visual analytics employs interactive visualizations to integrate users' knowledge and inference capability into numerical/algorithmic data analysis processes. It is an active research field that has applications in many sectors, such as security, finance, and business. The growing popularity of visual analytics in recent years creates the need for a broad survey that reviews and assesses the recent developments in the field. This report reviews and classifies recent work into a set of application categories including space and time, multivariate, text, graph and network, and other applications. More importantly, this report presents analytics space, inspired by design space, which relates each application category to the key steps in visual analytics, including visual mapping, model-based analysis, and user interactions. We explore and discuss the analytics space to acld the current understanding and better understand research trends in the field.展开更多
Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the re...Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the research areas of data management,visual analytics and human-computer interaction.Then for different types of data such as multimedia data,textual data,trajectory data,and graph data,we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages.Based on a thorough analysis,we propose a general visual analytics framework for interactively cleansing data.Finally,the challenges and opportunities are analyzed and discussed in the context of data and humans.展开更多
GPS-based taxi trajectories contain valuable knowledge about movement patterns for transportation and urban planning.Topic modeling is an effective tool to extract semantic information from taxi trajectory data.Howeve...GPS-based taxi trajectories contain valuable knowledge about movement patterns for transportation and urban planning.Topic modeling is an effective tool to extract semantic information from taxi trajectory data.However,previous methods generally ignore trajectory directions that are important in the analysis of movement patterns.In this paper,we employ the bigram topic model rather than traditional topic models to analyze textualized trajectories and consider the direction information of trajectories.We further propose a modified Apriori algorithm to extract topical sub-trajectories and use them to represent each topic.Finally,we design a visual analytics system with several linked views to facilitate users to interactively explore movement patterns from topics and topical sub-trajectories.The case studies with Chengdu taxi trajectory data demonstrate the effectiveness of the proposed system.展开更多
Massive Open Online Courses(MOOCs)often provide online discussion forum tools to facilitate learner interaction and communication.Having massive forum messages posted by learners everyday,MOOC forums are regarded as a...Massive Open Online Courses(MOOCs)often provide online discussion forum tools to facilitate learner interaction and communication.Having massive forum messages posted by learners everyday,MOOC forums are regarded as an important source for understanding learners activities and opinions.However,the high volume and heterogeneity of MOOC forum contents make it challenging to analyze forum data effectively from different perspectives of discussions and to integrate diverse information into a coherent understanding of issues of concern.In this paper,we report a study on the design of a visual analytics tool to facilitate the multifaceted analysis of online discussion forums.This tool,called MessageLens,aims at helping MOOC instructors to gain a better understanding of forum discussions from three facets:discussion topic,learner attitude,and communication among learners.With various visualization tools,instructors can investigate learner activities from different perspectives.We report a case study with real-world MOOC forum data to present the features of MessageLens and a preliminary evaluation study on the benefits and areas of improvement of the system.Our research suggests an approach to analyzing rich communication contents as well as dynamic social interactions among people.展开更多
Climate research produces a wealth of multivariate data. These data often have a geospatial reference and so it is of interest to show them within their geospatial context. One can consider this configuration as a mul...Climate research produces a wealth of multivariate data. These data often have a geospatial reference and so it is of interest to show them within their geospatial context. One can consider this configuration as a multifield visualization problem, where the geo-space provides the expanse of the field. However, there is a limit on the amount of multivariate information that can be fit within a certain spatial location, and the use of linked multivariate information displays has previously been devised to bridge this gap. In this paper we focus on the interactions in the geographical display, present an implementation that uses Google Earth, and demonstrate it within a tightly linked parallel coordinates display. Several other visual representations, such as pie and bar charts are integrated into the Google Earth display and can be interactively manipulated. Further, we also demonstrate new brushing and visualization techniques for parallel coordinates, such as fixed-window brushing and correlation-enhanced display. We conceived our system with a team of climate researchers, who already made a few important discoveries using it. This demonstrates our system's great potential to enable scientific discoveries, possibly also in other domains where data have a geospatial reference.展开更多
The ever-increasing amount of major security incidents has led to an emerging interest in cooperative approaches to encounter cyber threats.To enable cooperation in detecting and preventing attacks it is an inevitable...The ever-increasing amount of major security incidents has led to an emerging interest in cooperative approaches to encounter cyber threats.To enable cooperation in detecting and preventing attacks it is an inevitable necessity to have structured and standardized formats to describe an incident.Corresponding formats are complex and of an extensive nature as they are often designed for automated processing and exchange.These characteristics hamper the readability and,therefore,prevent humans from understanding the documented incident.This is a major problem since the success and effectiveness of any security measure rely heavily on the contribution of security experts.To meet these shortcomings we propose a visual analytics concept enabling security experts to analyze and enrich semi-structured cyber threat intelligence information.Our approach combines an innovative way of persisting this data with an interactive visualization component to analyze and edit the threat information.We demonstrate the feasibility of our concept using the Structured Threat Information eXpression,the state-ofthe-art format for reporting cyber security issues.展开更多
The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data dist...The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data distribution as a combination of multiple interrelated elements of two or more data components that can be represented and treated as a unified whole.Our theoretical model describes how patterns are made by relationships existing between data elements.Knowing the types of these relationships,it is possible to predict what kinds of patterns may exist.We demonstrate how our model underpins and refines the established fundamental principles of visualisation.The model also suggests a range of interactive analytical operations that can support visual analytics workflows where patterns,once discovered,are explicitly involved in further data analysis.展开更多
Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare.Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians t...Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare.Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence.However,such analysis is not straightforward due to the characteristics of medical records:high dimensionality,irregularity in time,and sparsity.To address this challenge,we introduce a method for similarity calculation of medical records.Our method employs event and sequence embeddings.While we use an autoencoder for the event embedding,we apply its variant with the self-attention mechanism for the sequence embedding.Moreover,in order to better handle the irregularity of data,we enhance the self-attention mechanism with consideration of different time intervals.We have developed a visual analytics system to support comparative studies of patient records.To make a comparison of sequences with different lengths easier,our system incorporates a sequence alignment method.Through its interactive interface,the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records.We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.展开更多
Effective analysis of large text collections remains a challenging problem given the growing volume of available text data.Recently,text mining techniques have been rapidly developed for automatically extracting key i...Effective analysis of large text collections remains a challenging problem given the growing volume of available text data.Recently,text mining techniques have been rapidly developed for automatically extracting key information from massive text data.Topic modeling,as one of the novel techniques that extracts a thematic structure from documents,is widely used to generate text summarization and foster an overall understanding of the corpus content.Although powerful,this technique may not be directly applicable for general analytics scenarios since the topics and topic-document relationship are often presented probabilistically in models.Moreover,information that plays an important role in knowledge discovery,for example,times and authors,is hardly reflected in topic modeling for comprehensive analysis.In this paper,we address this issue by presenting a visual analytics system,VISTopic,to help users make sense of large document collections based on topic modeling.VISTopic first extracts a set of hierarchical topics using a novel hierarchical latent tree model(HLTM)(Liu et al.,2014).In specific,a topic view accounting for the model features is designed for overall understanding and interactive exploration of the topic organization.To leverage multi-perspective information for visual analytics,VISTopic further provides an evolution view to reveal the trend of topics and a document view to show details of topical documents.Three case studies based on the dataset of IEEE VIS conference demonstrate the effectiveness of our system in gaining insights from large document collections.展开更多
Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have rec...Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.展开更多
With the rapid development of Internet technology,a rich set of e-government data are collected by the government departments.For example,a variety of feedback text data can be obtained quickly and efficiently through...With the rapid development of Internet technology,a rich set of e-government data are collected by the government departments.For example,a variety of feedback text data can be obtained quickly and efficiently through various channels such as the mayor’s mailbox.It is an effective way to improve the working efficiency of the government to extract hot topics from large-scale e-government text data,establish the correlation between topics and geographic space,and interactively explore the sources of public feedback problems.However,it is a difficult task to explore the large-scale e-government text data with traditional visualization methods such as word cloud,because too many words are hardly distributed in a limited space which will largely disturb the visual perception.In this paper,we propose a visual analytics system for large-scale e-government data exploration by means of simplified word cloud.Firstly,a representation learning model is used to embed the text data into high-dimensional space to quantitatively represent the semantic structure features of e-government text data.Then,the high-dimensional vectors are projected into a two-dimensional space where the coordinate distribution of points effectively expresses the semantic similarity of original words,which also presents geographic features that can be quantized by means of a similarity computing model.In order to simplify the understanding of large-scale e-government data and improve the cognitive efficiency of word could,we adopt the adaptive blue noise method to sample the topic words,which can simplify the visual expression of word cloud and improve the understanding efficiency of e-government data without losing the semantic structure features.Furthermore,an abstraction and visual analysis system for large-scale e-government text data is designed and implemented by integrating the above representation learning model,sampling-based abstraction model of word cloud,and topic and geographic correlation analysis model.This system provides convenient human-computer interaction modes and supports users to explore the analysis and extraction of the characteristics hidden in large-scale e-government data.It also helps government departments quickly locate the hot topics of public concern and their related regional distribution,and provides decision support to further improve the work efficiency of the government.Case studies based on real-world datasets further verify the effectiveness and practicability of our system.展开更多
With the explosion of digital data,the need for advanced visual analytics,including coordinated multiple views(CMV),is rapidly increasing.CMV enable users to discover patterns and examine relationships across multiple...With the explosion of digital data,the need for advanced visual analytics,including coordinated multiple views(CMV),is rapidly increasing.CMV enable users to discover patterns and examine relationships across multiple visualizations of one or multiple datasets.CMV have been implemented in a web-based environment through the Australian Urban Research Infrastructure Network(AURIN)project.AURIN offers a platform providing seamless and secure access to an extensive range of distributed urban datasets across Australia.Visual exploration of these datasets is essential to support research endeavors.This paper focuses on the challenges in dealing with complexity and multidimensionality of datasets used in CMV.We rely on the concept of multidimensional data cubes as the theoretical framework for coordination across visualizations.Using the concept of data cubes and hierarchical dimensions,we present strategies to automatically build render groups.This provides an implicit coordination based on cube structures and a framework to establish links between a dataset with its aggregates in a one-to-many fashion.The CMV approach is demonstrated using aggregate-level data,which is provided through federated data services.The paper discusses the issues around our CMV implementation and concludes by reflecting on the challenges in supporting spatio-temporal urban data exploration.展开更多
Consumer credit risk analysis plays a significant role in stabilizing a bank's investments and in maximizing its profits. As a large financial institution, Bank of America relies on effective risk analyses to minimiz...Consumer credit risk analysis plays a significant role in stabilizing a bank's investments and in maximizing its profits. As a large financial institution, Bank of America relies on effective risk analyses to minimize the net credit loss resulting from its credit products (e.g., mortgage and credit card loans). Due to the size and complexity of the data involved in this process, analysts are facing challenges in monitoring the data, comparing its geospatial and temporal patterns, and developing appropriate strategies based on the correlation from multiple analysis perspectives. To address these challenges, we present RiskVA, an interactive visual analytics system that is tailored to support credit risk analysis. RiskVA provides interactive data exploration and correlation, and visually facilitates depictions of market fluctuations and temporal trends for a targeted credit product. When evaluated by analysts from Bank of America, RiskVA was appreciated for its effectiveness in facilitating the bank's risk management.展开更多
Although significant progress has been made towards effective insight discovery in visual analytics systems, there are few effective approaches for managing the large number of insights generated in visual analytics p...Although significant progress has been made towards effective insight discovery in visual analytics systems, there are few effective approaches for managing the large number of insights generated in visual analytics processes. This paper presents Manylnsights, a multidimensional visual analytics prototype that integrates several novel insight management approaches proposed by the authors in their previous work. These approaches include insight annotation, browsing, retrieval, organization, and association. This paper also reports a long-term case study that evaluated Manylnsights with a domain expert, realistic analytic tasks, and real datasets.展开更多
基金Funded by the Deutsche Forschungsgemeinschaft(German Research Foundation),No.251654672—TRR 161(Project B01)Germany’s Excellence Strategy,No.EXC-2075—390740016.
文摘In this paper,we introduce a visual analytics approach aimed at helping machine learning experts analyze the hidden states of layers in recurrent neural networks.Our technique allows the user to interactively inspect how hidden states store and process information throughout the feeding of an input sequence into the network.The technique can help answer questions,such as which parts of the input data have a higher impact on the prediction and how the model correlates each hidden state configuration with a certain output.Our visual analytics approach comprises several components:First,our input visualization shows the input sequence and how it relates to the output(using color coding).In addition,hidden states are visualized through a nonlinear projection into a 2-D visualization space using t-distributed stochastic neighbor embedding to understand the shape of the space of the hidden states.Trajectories are also employed to show the details of the evolution of the hidden state configurations.Finally,a time-multi-class heatmap matrix visualizes the evolution of the expected predictions for multi-class classifiers,and a histogram indicates the distances between the hidden states within the original space.The different visualizations are shown simultaneously in multiple views and support brushing-and-linking to facilitate the analysis of the classifications and debugging for misclassified input sequences.To demonstrate the capability of our approach,we discuss two typical use cases for long short-term memory models applied to two widely used natural language processing datasets.
基金This work was supported by National Natural Science Foundation of China(62072400)the Collaborative Innovation Center of Artificial Intel-ligence by MOE and Zhejiang Provincial Government(ZJU),and the Zhejiang Lab(2021KE0AC02)。
文摘Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models.Urban visual analytics has already achieved remarkable success in tackling urban problems and providing fundamental services for smart cities.To promote further academic research and assist the development of industrial urban analytics systems,we comprehensively review urban visual analytics studies from four perspectives.In particular,we identify 8 urban domains and 22 types of popular visualization,analyze 7 types of computational method,and categorize existing systems into 4 types based on their integration of visualization techniques and computational models.We conclude with potential research directions and opportunities.
基金supported by National Key R and D Program of China(under Grant No.2016YFA0502304)Important Drug Development Fund,Ministry of Science and Technology of China(2018ZX09735002).
文摘Although traditional Chinese medicine(TCM)and modern medicine(MM)have considerably different treatment philosophies,they both make important contributions to human health care.TCM physicians usually treat diseases using TCM formula(TCMF),which is a combination of specific herbs,based on the holistic philosophy of TCM,whereas MM physicians treat diseases using chemical drugs that interact with specific biological molecules.The difference between the holistic view of TCM and the atomistic view of MM hinders their combination.Tools that are able to bridge together TCM and MM are essential for promoting the combination of these disciplines.In this paper,we present TCMFVis,a visual analytics system that would help domain experts explore the potential use of TCMFs in MM at the molecular level.TCMFVis deals with two significant challenges,namely,(i)intuitively obtaining valuable insights from heterogeneous data involved in TCMFs and(ii)efficiently identifying the common features among a cluster of TCMFs.In this study,a four-level(herb-ingredient-targetdisease)visual analytics framework was designed to facilitate the analysis of heterogeneous data in a proper workflow.Several set visualization techniques were first introduced into the system to facilitate the identification of common features among TCMFs.Case studies on two groups of TCMFs clustered by function were conducted by domain experts to evaluate TCMFVis.The results of these case studies demonstrate the usability and scalability of the system.
基金This material is based on research sponsored by DARPA,United States under agreement number FA8750-18-2-0077The U.S.Government is authorized to reproduce and distribute reprints for Governmental purposes not withstanding any copyright notation thereonThe views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements,either expressed or implied,of DARPA or the U.S.Government。
文摘Digital phenotyping is the characterization of human behavior patterns based on data from digital devices such as smartphones in order to gain insights into the users’state and especially to identify ailments.To support supervised machine learning,digital phenotyping requires gathering data from study participants’smartphones as they live their lives.Periodically,participants are then asked to provide ground truth labels about their health status.Analyzing such complex data is challenging due to limited contextual information and imperfect health/wellness labels.We propose INteractive PHOne-o-typing VISualization(INPHOVIS),an interactive visual framework for exploratory analysis of smartphone health data to study phone-o-types.Prior visualization work has focused on mobile health data with clear semantics such as steps or heart rate data collected using dedicated health devices and wearables such as smartwatches.However,unlike smartphones which are owned by over 85 percent of the US population,wearable devices are less prevalent thus reducing the number of people from whom such data can be collected.In contrast,the‘‘low-level"sensor data(e.g.,accelerometer or GPS data)supported by INPHOVIS can be easily collected using smartphones.Data visualizations are designed to provide the essential contextualization of such data and thus help analysts discover complex relationships between observed sensor values and health-predictive phone-o-types.To guide the design of INPHOVIS,we performed a hierarchical task analysis of phone-o-typing requirements with health domain experts.We then designed and implemented multiple innovative visualizations integral to INPHOVIS including stacked bar charts to show diurnal behavioral patterns,calendar views to visualize day-level data along with bar charts,and correlation views to visualize important wellness predictive data.We demonstrate the usefulness of INPHOVIS with walk-throughs of use cases.We also evaluated INPHOVIS with expert feedback and received encouraging responses.
基金This work was supported by National Basic Re- search Program of China (973 Program) (2015CB352503), Major Pro- gram of the National Natural Science Foundation of China (61232012), the National Natural Science Foundation of China (Grant Nos. 61303141, 61422211, u1536118, u1536119), Zhejiang Provincial Natural Science Foundation of China (LR13F020001), the Fundamental Research Funds for the Central Universities, the Innovation Joint Research Center for Cyber- Physical-Society System, and the United State's National Science Founda- tion (1350573).
文摘A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem snurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the pre- diction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual ana- lytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summariza- tion of the predictive analytics workfiow.
基金supported by the National Key R&D Program of China(Nos.2018YFB1004300,2019YFB1405703)the National Natural Science Foundation of China(Nos.61761136020,61672307,61672308,61936002)TC190A4DA/3,the Institute Guo Qiang,Tsinghua University,in part by Tsinghua–Kuaishou Institute of Future Media Data。
文摘Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization.To better identify which research topics are promising and to learn how to apply relevant techniques in visual analytics,we systematically review259 papers published in the last ten years together with representative works before 2010.We build a taxonomy,which includes three first-level categories:techniques before model building,techniques during modeling building,and techniques after model building.Each category is further characterized by representative analysis tasks,and each task is exemplified by a set of recent influential works.We also discuss and highlight research challenges and promising potential future research opportunities useful for visual analytics researchers.
基金partly supported by the National Natural Science Foundation of China under Grant No. 61070114the Program for New Century Excellent Talents in University of China under Grant No. NCET-12-1087the Zhejiang Provincial Qianjiang Talents of China under Grant No. 2013R10054
文摘Visual analytics employs interactive visualizations to integrate users' knowledge and inference capability into numerical/algorithmic data analysis processes. It is an active research field that has applications in many sectors, such as security, finance, and business. The growing popularity of visual analytics in recent years creates the need for a broad survey that reviews and assesses the recent developments in the field. This report reviews and classifies recent work into a set of application categories including space and time, multivariate, text, graph and network, and other applications. More importantly, this report presents analytics space, inspired by design space, which relates each application category to the key steps in visual analytics, including visual mapping, model-based analysis, and user interactions. We explore and discuss the analytics space to acld the current understanding and better understand research trends in the field.
基金This research was funded by National Key R&D Program of China(No.SQ2018YFB100002)the National Natural Science Foundation of China(No.s 61761136020,61672308)+5 种基金Microsoft Research Asia,Fraunhofer Cluster of Excellence on"Cognitive Internet Technologies",EU through project Track&Know(grant agreement 780754)NSFC(61761136020)NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1609217)Zhejiang Provincial Natural Science Foundation(LR18F020001)NSFC Grants 61602306Fundamental Research Funds for the Central Universities。
文摘Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the research areas of data management,visual analytics and human-computer interaction.Then for different types of data such as multimedia data,textual data,trajectory data,and graph data,we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages.Based on a thorough analysis,we propose a general visual analytics framework for interactively cleansing data.Finally,the challenges and opportunities are analyzed and discussed in the context of data and humans.
基金supported by the National Key Research&Development Program of China(2017YFB0202203)National Nat-ural Science Foundation of China(61472354,61672452)NSFCGuangdong Joint Fund,China(U1611263).
文摘GPS-based taxi trajectories contain valuable knowledge about movement patterns for transportation and urban planning.Topic modeling is an effective tool to extract semantic information from taxi trajectory data.However,previous methods generally ignore trajectory directions that are important in the analysis of movement patterns.In this paper,we employ the bigram topic model rather than traditional topic models to analyze textualized trajectories and consider the direction information of trajectories.We further propose a modified Apriori algorithm to extract topical sub-trajectories and use them to represent each topic.Finally,we design a visual analytics system with several linked views to facilitate users to interactively explore movement patterns from topics and topical sub-trajectories.The case studies with Chengdu taxi trajectory data demonstrate the effectiveness of the proposed system.
文摘Massive Open Online Courses(MOOCs)often provide online discussion forum tools to facilitate learner interaction and communication.Having massive forum messages posted by learners everyday,MOOC forums are regarded as an important source for understanding learners activities and opinions.However,the high volume and heterogeneity of MOOC forum contents make it challenging to analyze forum data effectively from different perspectives of discussions and to integrate diverse information into a coherent understanding of issues of concern.In this paper,we report a study on the design of a visual analytics tool to facilitate the multifaceted analysis of online discussion forums.This tool,called MessageLens,aims at helping MOOC instructors to gain a better understanding of forum discussions from three facets:discussion topic,learner attitude,and communication among learners.With various visualization tools,instructors can investigate learner activities from different perspectives.We report a case study with real-world MOOC forum data to present the features of MessageLens and a preliminary evaluation study on the benefits and areas of improvement of the system.Our research suggests an approach to analyzing rich communication contents as well as dynamic social interactions among people.
基金Partial support for this research was provided by the US National Science Foundation (Nos. 1050477, 0959979, and 1117132)by a Brookhaven National Lab LDRD grant+2 种基金by the US Department of Energy (DOE) Office of Basic Energy Sciences, Division of Chemical Sciences, GeosciencesBiosciences and by the IT Consilience Creative Project through the Ministry of Knowledge Economy, Republic of Korea national scientific user facility sponsored by the DOE's OBER at Pacific Northwest National Laboratory (PNNL)PNNL is operated by the US DOE by Battelle Memorial Institute under contract No.DE-AC06-76RL0 1830
文摘Climate research produces a wealth of multivariate data. These data often have a geospatial reference and so it is of interest to show them within their geospatial context. One can consider this configuration as a multifield visualization problem, where the geo-space provides the expanse of the field. However, there is a limit on the amount of multivariate information that can be fit within a certain spatial location, and the use of linked multivariate information displays has previously been devised to bridge this gap. In this paper we focus on the interactions in the geographical display, present an implementation that uses Google Earth, and demonstrate it within a tightly linked parallel coordinates display. Several other visual representations, such as pie and bar charts are integrated into the Google Earth display and can be interactively manipulated. Further, we also demonstrate new brushing and visualization techniques for parallel coordinates, such as fixed-window brushing and correlation-enhanced display. We conceived our system with a team of climate researchers, who already made a few important discoveries using it. This demonstrates our system's great potential to enable scientific discoveries, possibly also in other domains where data have a geospatial reference.
基金supported by the Federal Ministry of Education and Research,Germany,as part of the BMBF DINGfest project。
文摘The ever-increasing amount of major security incidents has led to an emerging interest in cooperative approaches to encounter cyber threats.To enable cooperation in detecting and preventing attacks it is an inevitable necessity to have structured and standardized formats to describe an incident.Corresponding formats are complex and of an extensive nature as they are often designed for automated processing and exchange.These characteristics hamper the readability and,therefore,prevent humans from understanding the documented incident.This is a major problem since the success and effectiveness of any security measure rely heavily on the contribution of security experts.To meet these shortcomings we propose a visual analytics concept enabling security experts to analyze and enrich semi-structured cyber threat intelligence information.Our approach combines an innovative way of persisting this data with an interactive visualization component to analyze and edit the threat information.We demonstrate the feasibility of our concept using the Structured Threat Information eXpression,the state-ofthe-art format for reporting cyber security issues.
基金This research was supported by Fraunhofer Center for Machine Learning within the Fraunhofer Cluster for Cognitive Internet Technologiesby DFG within Priority Programme 1894(SPP VGI)+2 种基金by EU in project SoBigData++by SESAR in projects TAPAS and SIMBADby Austrian Science Fund(FWF)project KnowVA(grant P31419-N31).
文摘The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data distribution as a combination of multiple interrelated elements of two or more data components that can be represented and treated as a unified whole.Our theoretical model describes how patterns are made by relationships existing between data elements.Knowing the types of these relationships,it is possible to predict what kinds of patterns may exist.We demonstrate how our model underpins and refines the established fundamental principles of visualisation.The model also suggests a range of interactive analytical operations that can support visual analytics workflows where patterns,once discovered,are explicitly involved in further data analysis.
基金the U.S.National Science Foundation through grant IIS-1741536 and a 2019 Seed Fund Award from CITRIS and the Banatao Institute at the University of California,United States.
文摘Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare.Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence.However,such analysis is not straightforward due to the characteristics of medical records:high dimensionality,irregularity in time,and sparsity.To address this challenge,we introduce a method for similarity calculation of medical records.Our method employs event and sequence embeddings.While we use an autoencoder for the event embedding,we apply its variant with the self-attention mechanism for the sequence embedding.Moreover,in order to better handle the irregularity of data,we enhance the self-attention mechanism with consideration of different time intervals.We have developed a visual analytics system to support comparative studies of patient records.To make a comparison of sequences with different lengths easier,our system incorporates a sequence alignment method.Through its interactive interface,the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records.We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.
基金This project is funded by a grant proposal(Ref:YBCB2009041-44)of Huawei Technologies Noah’s Ark Lab.
文摘Effective analysis of large text collections remains a challenging problem given the growing volume of available text data.Recently,text mining techniques have been rapidly developed for automatically extracting key information from massive text data.Topic modeling,as one of the novel techniques that extracts a thematic structure from documents,is widely used to generate text summarization and foster an overall understanding of the corpus content.Although powerful,this technique may not be directly applicable for general analytics scenarios since the topics and topic-document relationship are often presented probabilistically in models.Moreover,information that plays an important role in knowledge discovery,for example,times and authors,is hardly reflected in topic modeling for comprehensive analysis.In this paper,we address this issue by presenting a visual analytics system,VISTopic,to help users make sense of large document collections based on topic modeling.VISTopic first extracts a set of hierarchical topics using a novel hierarchical latent tree model(HLTM)(Liu et al.,2014).In specific,a topic view accounting for the model features is designed for overall understanding and interactive exploration of the topic organization.To leverage multi-perspective information for visual analytics,VISTopic further provides an evolution view to reveal the trend of topics and a document view to show details of topical documents.Three case studies based on the dataset of IEEE VIS conference demonstrate the effectiveness of our system in gaining insights from large document collections.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea govem-ment(MSIP)(No.NRF-2016R1A2B2007153)by the Han-kuk University of Foreign Studies Research Fund.
文摘Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.
基金the National Natural Science Foundation of China(No.61872314,No.61802339)the Natural Science Foundation of Zhejiang Province(No.LY18F020024)+2 种基金the Humanities and Social Sciences Foundation of Ministry of Education in China(No.18YJC910017)the Major Humanities and Social Sciences Research Project in Zhejiang Province(2018QN021)the Open Project Program of the State Key Lab of CAD&CG of Zhejiang University(No.A2001)
文摘With the rapid development of Internet technology,a rich set of e-government data are collected by the government departments.For example,a variety of feedback text data can be obtained quickly and efficiently through various channels such as the mayor’s mailbox.It is an effective way to improve the working efficiency of the government to extract hot topics from large-scale e-government text data,establish the correlation between topics and geographic space,and interactively explore the sources of public feedback problems.However,it is a difficult task to explore the large-scale e-government text data with traditional visualization methods such as word cloud,because too many words are hardly distributed in a limited space which will largely disturb the visual perception.In this paper,we propose a visual analytics system for large-scale e-government data exploration by means of simplified word cloud.Firstly,a representation learning model is used to embed the text data into high-dimensional space to quantitatively represent the semantic structure features of e-government text data.Then,the high-dimensional vectors are projected into a two-dimensional space where the coordinate distribution of points effectively expresses the semantic similarity of original words,which also presents geographic features that can be quantized by means of a similarity computing model.In order to simplify the understanding of large-scale e-government data and improve the cognitive efficiency of word could,we adopt the adaptive blue noise method to sample the topic words,which can simplify the visual expression of word cloud and improve the understanding efficiency of e-government data without losing the semantic structure features.Furthermore,an abstraction and visual analysis system for large-scale e-government text data is designed and implemented by integrating the above representation learning model,sampling-based abstraction model of word cloud,and topic and geographic correlation analysis model.This system provides convenient human-computer interaction modes and supports users to explore the analysis and extraction of the characteristics hidden in large-scale e-government data.It also helps government departments quickly locate the hot topics of public concern and their related regional distribution,and provides decision support to further improve the work efficiency of the government.Case studies based on real-world datasets further verify the effectiveness and practicability of our system.
文摘With the explosion of digital data,the need for advanced visual analytics,including coordinated multiple views(CMV),is rapidly increasing.CMV enable users to discover patterns and examine relationships across multiple visualizations of one or multiple datasets.CMV have been implemented in a web-based environment through the Australian Urban Research Infrastructure Network(AURIN)project.AURIN offers a platform providing seamless and secure access to an extensive range of distributed urban datasets across Australia.Visual exploration of these datasets is essential to support research endeavors.This paper focuses on the challenges in dealing with complexity and multidimensionality of datasets used in CMV.We rely on the concept of multidimensional data cubes as the theoretical framework for coordination across visualizations.Using the concept of data cubes and hierarchical dimensions,we present strategies to automatically build render groups.This provides an implicit coordination based on cube structures and a framework to establish links between a dataset with its aggregates in a one-to-many fashion.The CMV approach is demonstrated using aggregate-level data,which is provided through federated data services.The paper discusses the issues around our CMV implementation and concludes by reflecting on the challenges in supporting spatio-temporal urban data exploration.
文摘Consumer credit risk analysis plays a significant role in stabilizing a bank's investments and in maximizing its profits. As a large financial institution, Bank of America relies on effective risk analyses to minimize the net credit loss resulting from its credit products (e.g., mortgage and credit card loans). Due to the size and complexity of the data involved in this process, analysts are facing challenges in monitoring the data, comparing its geospatial and temporal patterns, and developing appropriate strategies based on the correlation from multiple analysis perspectives. To address these challenges, we present RiskVA, an interactive visual analytics system that is tailored to support credit risk analysis. RiskVA provides interactive data exploration and correlation, and visually facilitates depictions of market fluctuations and temporal trends for a targeted credit product. When evaluated by analysts from Bank of America, RiskVA was appreciated for its effectiveness in facilitating the bank's risk management.
基金partially supported by the US National Science Foundation (Nos. 0946400 and 0915528)the DHS Visual Analytics for Command, Control, and Interoperability (VACCINE) Center of Excellence, under the auspices of the SouthEast Regional Visual Analytics Center
文摘Although significant progress has been made towards effective insight discovery in visual analytics systems, there are few effective approaches for managing the large number of insights generated in visual analytics processes. This paper presents Manylnsights, a multidimensional visual analytics prototype that integrates several novel insight management approaches proposed by the authors in their previous work. These approaches include insight annotation, browsing, retrieval, organization, and association. This paper also reports a long-term case study that evaluated Manylnsights with a domain expert, realistic analytic tasks, and real datasets.