The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future...The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future trends of data visualization in big data analysis. The article first systematically reviews the theoretical foundations and technological evolution of data visualization, and thoroughly analyzes the challenges faced by visualization in the big data environment, such as massive data processing, real-time visualization requirements, and multi-dimensional data display. Through extensive literature research, it explores innovative application cases and theoretical models of data visualization in multiple fields including business intelligence, scientific research, and public decision-making. The study reveals that interactive visualization, real-time visualization, and immersive visualization technologies may become the main directions for future development and analyzes the potential of these technologies in enhancing user experience and data comprehension. The paper also delves into the theoretical potential of artificial intelligence technology in enhancing data visualization capabilities, such as automated chart generation, intelligent recommendation of visualization schemes, and adaptive visualization interfaces. The research also focuses on the role of data visualization in promoting interdisciplinary collaboration and data democratization. Finally, the paper proposes theoretical suggestions for promoting data visualization technology innovation and application popularization, including strengthening visualization literacy education, developing standardized visualization frameworks, and promoting open-source sharing of visualization tools. This study provides a comprehensive theoretical perspective for understanding the importance of data visualization in the big data era and its future development directions.展开更多
This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,tradit...This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works w...Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works which looks similar to painting works created by famous artists.This paper presents a painting image browser which assists the explorative discovery of user-interested painting works.The presented browser applies a new multidimensional data visualization technique that highlights particular ranges of particular numeric values based on association rules to suggest cues to find favorite painting images.This study assumes a large number of painting images are provided where categorical information(e.g.,names of artists,created year)is assigned to the images.The presented system firstly calculates the feature values of the images as a preprocessing step.Then the browser visualizes the multidimensional feature values as a heatmap and highlights association rules discovered from the relationships between the feature values and categorical information.This mechanism enables users to explore favorite painting images or painting images that look similar to famous painting works.Our case study and user evaluation demonstrates the effectiveness of the presented image browser.展开更多
The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression ana...The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.展开更多
Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two comp...Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.展开更多
Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key wo...Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.展开更多
Background:Reproductive,maternal,newborn,child health,and nutrition(RMNCH&N)data is an indispensable tool for program and policy decisions in low-and middle-income countries.However,being equipped with evidence do...Background:Reproductive,maternal,newborn,child health,and nutrition(RMNCH&N)data is an indispensable tool for program and policy decisions in low-and middle-income countries.However,being equipped with evidence doesn’t necessarily translate to program and policy changes.This study aimed to characterize data visualization interpretation capacity and preferences among RMNCH&N Tanzanian program implementers and policymakers(“decision-makers”)to design more effective approaches towards promoting evidence-based RMNCH&N decisions in Tanzania.Methods:We conducted 25 semi-structured interviews in Kiswahili with junior,mid-level,and senior RMNCH&N decision-makers working in Tanzanian government institutions.We used snowball sampling to recruit participants with different rank and roles in RMNCH&N decision-making.Using semi-structured interviews,we probed participants on their statistical skills and data use,and asked participants to identify key messages and rank prepared RMNCH&N visualizations.We used a grounded theory approach to organize themes and identify findings.Results:The findings suggest that data literacy and statistical skills among RMNCH&N decision-makers in Tanzania varies.Most participants demonstrated awareness of many critical factors that should influence a visualization choice—audience,key message,simplicity—but assessments of data interpretation and preferences suggest that there may be weak knowledge of basic statistics.A majority of decision-makers have not had any statistical training since attending university.There appeared to be some discomfort with interpreting and using visualizations that are not bar charts,pie charts,and maps.Conclusions:Decision-makers must be able to understand and interpret RMNCH&N data they receive to be empowered to act.Addressing inadequate data literacy and presentation skills among decision-makers is vital to bridging gaps between evidence and policymaking.It would be beneficial to host basic data literacy and visualization training for RMNCH&N decision-makers at all levels in Tanzania,and to expand skills on developing key messages from visualizations.展开更多
The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean mo...The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean model and visualizing model results have become some of the most important research topics of ‘Digital Ocean'. In this paper, a spherical ray casting method is developed to improve the traditional ray-casting algorithm and to make efficient use of GPUs. Aiming at the ocean current data, a 3D view-dependent line integral convolution method is used, in which the spatial frequency is adapted according to the distance from a camera. The study is based on a 3D virtual reality and visualization engine, namely the VV-Ocean. Some interactive operations are also provided to highlight the interesting structures and the characteristics of volumetric data. Finally, the marine data gathered in the East China Sea are displayed and analyzed. The results show that the method meets the requirements of real-time and interactive rendering.展开更多
Trying to provide a medical data visualization analysis tool, the machine learning methods are introduced to classify the malignant neoplasm of lung within the medical database MIMIC-III (Medical Information Mart for ...Trying to provide a medical data visualization analysis tool, the machine learning methods are introduced to classify the malignant neoplasm of lung within the medical database MIMIC-III (Medical Information Mart for Intensive Care III, USA). The K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF) are selected as the predictive tool. Based on the experimental result, the machine learning predictive tools are integrated into the medical data visualization analysis platform. The platform software can provide a flexible medical data visualization analysis tool for the doctors. The related practice indicates that visualization analysis result can be generated based on simple steps for the doctors to do some research work on the data accumulated in hospital, even they have not taken special data analysis training.展开更多
Augmented Reality(AR),as a novel data visualization tool,is advantageous in revealing spatial data patterns and data-context associations.Accordingly,recent research has identified AR data visualization as a promising...Augmented Reality(AR),as a novel data visualization tool,is advantageous in revealing spatial data patterns and data-context associations.Accordingly,recent research has identified AR data visualization as a promising approach to increasing decision-making efficiency and effectiveness.As a result,AR has been applied in various decision support systems to enhance knowledge conveying and comprehension,in which the different data-reality associations have been constructed to aid decision-making.However,how these AR visualization strategies can enhance different decision support datasets has not been reviewed thoroughly.Especially given the rise of big data in the modern world,this support is critical to decision-making in the coming years.Using AR to embed the decision support data and explanation data into the end user’s physical surroundings and focal contexts avoids isolating the human decision-maker from the relevant data.Integrating the decision-maker’s contexts and the DSS support in AR is a difficult challenge.This paper outlines the current state of the art through a literature review in allowing AR data visualization to support decision-making.To facilitate the publication classification and analysis,the paper proposes one taxonomy to classify different AR data visualization based on the semantic associations between the AR data and physical context.Based on this taxonomy and a decision support system taxonomy,37 publications have been classified and analyzed from multiple aspects.One of the contributions of this literature review is a resulting AR visualization taxonomy that can be applied to decision support systems.Along with this novel tool,the paper discusses the current state of the art in this field and indicates possible future challenges and directions that AR data visualization will bring to support decision-making.展开更多
The widespread use of numerical simulations in different scientific domains provides a variety of research opportunities.They often output a great deal of spatio-temporal simulation data,which are traditionally charac...The widespread use of numerical simulations in different scientific domains provides a variety of research opportunities.They often output a great deal of spatio-temporal simulation data,which are traditionally characterized as single-run,multi-run,multi-variate,multi-modal and multi-dimensional.From the perspective of data exploration and analysis,we noticed that many works focusing on spatiotemporal simulation data often share similar exploration techniques,for example,the exploration schemes designed in simulation space,parameter space,feature space and combinations of them.However,it lacks a survey to have a systematic overview of the essential commonalities shared by those works.In this survey,we take a novel multi-space perspective to categorize the state-ofthe-art works into three major categories.Specifically,the works are characterized as using similar techniques such as visual designs in simulation space(e.g,visual mapping,boxplot-based visual summarization,etc.),parameter space analysis(e.g,visual steering,parameter space projection,etc.)and data processing in feature space(e.g,feature definition and extraction,sampling,reduction and clustering of simulation data,etc.).展开更多
A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in diff...A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.展开更多
Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes...Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes for this task.However,a predefined color mapping is not always optimal,failing to consider users’needs for customization.Given an input cate-gorical data visualization and a reference image,we present an effective method to automatically generate a coloring that resembles the reference while allowing classes to be easily distinguished.We extract a color palette with high perceptual distance between the colors by sampling dominant and discriminable colors from the image’s color space.These colors are assigned to given classes by solving an integer quadratic program to optimize point distinctness of the given chart while preserving the color spatial relations in the source image.We show results on various coloring tasks,with a diverse set of new coloring appearances for the input data.We also compare our approach to state-of-the-art palettes in a controlled user study,which shows that our method achieves comparable performance in class discrimination,while being more similar to the source image.User feedback after using our system verifies its efficiency in automatically generating desirable colorings that meet the user’s expectations when choosing a reference.展开更多
Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been di...Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.展开更多
Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research...Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.展开更多
This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details an...This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data(like big data)into analyzable quantities.It is also used to offer an overview in places where general trends are more important than individual details.Symbolic data comes in many forms like intervals,histograms,categories and modal multi-valued objects.Symbolic data can also be considered as a distribution.Currently,the de facto visualization approach for symbolic data is zoomstars which has many limitations.The biggest limitation is that the default distributions(histograms)are not supported in 2D as additional dimension is required.This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach.In addition,several improvements for categorical and modal variables are proposed for a clearer indication of presented categories.Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal.Furthermore,an alternative approach that allows visualizing the whole data set in comprehensive table-like graph,called shape encoding,is proposed.These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends,similar objects and important features,detecting outliers and discrepancies in the data.展开更多
One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suita...One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suitable climate in the greenhouse became imperative to ensure that the indoor plants are more productive hence the agriculture sector was not left behind. That notwithstanding, the introduction and deployment of IoT technology in agriculture solves many problems and increases crop production. This paper focuses mainly on the deployment of the Internet of Things (IoT) in acquiring real- time data of environmental parameters in the greenhouse. Various IoT technologies that can be applicable in greenhouse monitoring system was presented and in the proposed model, a method is developed to send the air temperature and humidity data obtained by the DHT11 sensor to the cloud using an ESP8266-based NodeMCU and firstly to the cloud platform Thing- Speak, and then to Adafruit.IO in which MQTT protocol was used for the reception of sensor data to the application layer referred as Human-Machine Interface. The system has been completely implemented in an actual prototype, allowing the acquiring of data and the publisher/subscriber concept used for communication. The data is published with a broker’s aid, which is responsible for transferring messages to the intended clients based on topic choice. Lastly, the functionality testing of MQTT was carried out and the results showed that the messages are successfully published.展开更多
Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image proces...Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
文摘The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future trends of data visualization in big data analysis. The article first systematically reviews the theoretical foundations and technological evolution of data visualization, and thoroughly analyzes the challenges faced by visualization in the big data environment, such as massive data processing, real-time visualization requirements, and multi-dimensional data display. Through extensive literature research, it explores innovative application cases and theoretical models of data visualization in multiple fields including business intelligence, scientific research, and public decision-making. The study reveals that interactive visualization, real-time visualization, and immersive visualization technologies may become the main directions for future development and analyzes the potential of these technologies in enhancing user experience and data comprehension. The paper also delves into the theoretical potential of artificial intelligence technology in enhancing data visualization capabilities, such as automated chart generation, intelligent recommendation of visualization schemes, and adaptive visualization interfaces. The research also focuses on the role of data visualization in promoting interdisciplinary collaboration and data democratization. Finally, the paper proposes theoretical suggestions for promoting data visualization technology innovation and application popularization, including strengthening visualization literacy education, developing standardized visualization frameworks, and promoting open-source sharing of visualization tools. This study provides a comprehensive theoretical perspective for understanding the importance of data visualization in the big data era and its future development directions.
文摘This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works which looks similar to painting works created by famous artists.This paper presents a painting image browser which assists the explorative discovery of user-interested painting works.The presented browser applies a new multidimensional data visualization technique that highlights particular ranges of particular numeric values based on association rules to suggest cues to find favorite painting images.This study assumes a large number of painting images are provided where categorical information(e.g.,names of artists,created year)is assigned to the images.The presented system firstly calculates the feature values of the images as a preprocessing step.Then the browser visualizes the multidimensional feature values as a heatmap and highlights association rules discovered from the relationships between the feature values and categorical information.This mechanism enables users to explore favorite painting images or painting images that look similar to famous painting works.Our case study and user evaluation demonstrates the effectiveness of the presented image browser.
文摘The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.
文摘Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.
基金This research was funded by the National Natural Science Foundation of China(No.51775185)Scientific Research Fund of Hunan Province Education Department(18C0003)+2 种基金Research project on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Innovation and Entrepreneurship Training Program for College Students in Hunan Province(2021-1980)Hunan Normal University University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open project,Grant Number 20181901CRP04.
文摘Many countries are paying more and more attention to the protection of water resources at present,and how to protect water resources has received extensive attention from society.Water quality monitoring is the key work to water resources protection.How to efficiently collect and analyze water quality monitoring data is an important aspect of water resources protection.In this paper,python programming tools and regular expressions were used to design a web crawler for the acquisition of water quality monitoring data from Global Freshwater Quality Database(GEMStat)sites,and the multi-thread parallelism was added to improve the efficiency in the process of downloading and parsing.In order to analyze and process the crawled water quality data,Pandas and Pyecharts are used to visualize the water quality data to show the intrinsic correlation and spatiotemporal relationship of the data.
基金Grant Number 7059904 on the“National Evaluation Platform Approach for Accountability in Women’s and Children’s Health”from the Department of Global Affairs Canada to the Institute for International Programs at the Johns Hopkins Bloomberg School of Public Health.
文摘Background:Reproductive,maternal,newborn,child health,and nutrition(RMNCH&N)data is an indispensable tool for program and policy decisions in low-and middle-income countries.However,being equipped with evidence doesn’t necessarily translate to program and policy changes.This study aimed to characterize data visualization interpretation capacity and preferences among RMNCH&N Tanzanian program implementers and policymakers(“decision-makers”)to design more effective approaches towards promoting evidence-based RMNCH&N decisions in Tanzania.Methods:We conducted 25 semi-structured interviews in Kiswahili with junior,mid-level,and senior RMNCH&N decision-makers working in Tanzanian government institutions.We used snowball sampling to recruit participants with different rank and roles in RMNCH&N decision-making.Using semi-structured interviews,we probed participants on their statistical skills and data use,and asked participants to identify key messages and rank prepared RMNCH&N visualizations.We used a grounded theory approach to organize themes and identify findings.Results:The findings suggest that data literacy and statistical skills among RMNCH&N decision-makers in Tanzania varies.Most participants demonstrated awareness of many critical factors that should influence a visualization choice—audience,key message,simplicity—but assessments of data interpretation and preferences suggest that there may be weak knowledge of basic statistics.A majority of decision-makers have not had any statistical training since attending university.There appeared to be some discomfort with interpreting and using visualizations that are not bar charts,pie charts,and maps.Conclusions:Decision-makers must be able to understand and interpret RMNCH&N data they receive to be empowered to act.Addressing inadequate data literacy and presentation skills among decision-makers is vital to bridging gaps between evidence and policymaking.It would be beneficial to host basic data literacy and visualization training for RMNCH&N decision-makers at all levels in Tanzania,and to expand skills on developing key messages from visualizations.
基金supported by the Natural Science Foundation of China under Project 41076115the Global Change Research Program of China under project 2012CB955603the Public Science and Technology Research Funds of the Ocean under project 201005019
文摘The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean model and visualizing model results have become some of the most important research topics of ‘Digital Ocean'. In this paper, a spherical ray casting method is developed to improve the traditional ray-casting algorithm and to make efficient use of GPUs. Aiming at the ocean current data, a 3D view-dependent line integral convolution method is used, in which the spatial frequency is adapted according to the distance from a camera. The study is based on a 3D virtual reality and visualization engine, namely the VV-Ocean. Some interactive operations are also provided to highlight the interesting structures and the characteristics of volumetric data. Finally, the marine data gathered in the East China Sea are displayed and analyzed. The results show that the method meets the requirements of real-time and interactive rendering.
文摘Trying to provide a medical data visualization analysis tool, the machine learning methods are introduced to classify the malignant neoplasm of lung within the medical database MIMIC-III (Medical Information Mart for Intensive Care III, USA). The K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF) are selected as the predictive tool. Based on the experimental result, the machine learning predictive tools are integrated into the medical data visualization analysis platform. The platform software can provide a flexible medical data visualization analysis tool for the doctors. The related practice indicates that visualization analysis result can be generated based on simple steps for the doctors to do some research work on the data accumulated in hospital, even they have not taken special data analysis training.
基金This research forms part of the CONSUS Programme which is funded under the SFI Strategic Partnerships Programme(16/SPP/3296)and is co-funded by Origin Enterprises Plc.
文摘Augmented Reality(AR),as a novel data visualization tool,is advantageous in revealing spatial data patterns and data-context associations.Accordingly,recent research has identified AR data visualization as a promising approach to increasing decision-making efficiency and effectiveness.As a result,AR has been applied in various decision support systems to enhance knowledge conveying and comprehension,in which the different data-reality associations have been constructed to aid decision-making.However,how these AR visualization strategies can enhance different decision support datasets has not been reviewed thoroughly.Especially given the rise of big data in the modern world,this support is critical to decision-making in the coming years.Using AR to embed the decision support data and explanation data into the end user’s physical surroundings and focal contexts avoids isolating the human decision-maker from the relevant data.Integrating the decision-maker’s contexts and the DSS support in AR is a difficult challenge.This paper outlines the current state of the art through a literature review in allowing AR data visualization to support decision-making.To facilitate the publication classification and analysis,the paper proposes one taxonomy to classify different AR data visualization based on the semantic associations between the AR data and physical context.Based on this taxonomy and a decision support system taxonomy,37 publications have been classified and analyzed from multiple aspects.One of the contributions of this literature review is a resulting AR visualization taxonomy that can be applied to decision support systems.Along with this novel tool,the paper discusses the current state of the art in this field and indicates possible future challenges and directions that AR data visualization will bring to support decision-making.
基金supported by the National Natural Science Foundation of China(NSFC)Grant Nos.61702271,61702270.
文摘The widespread use of numerical simulations in different scientific domains provides a variety of research opportunities.They often output a great deal of spatio-temporal simulation data,which are traditionally characterized as single-run,multi-run,multi-variate,multi-modal and multi-dimensional.From the perspective of data exploration and analysis,we noticed that many works focusing on spatiotemporal simulation data often share similar exploration techniques,for example,the exploration schemes designed in simulation space,parameter space,feature space and combinations of them.However,it lacks a survey to have a systematic overview of the essential commonalities shared by those works.In this survey,we take a novel multi-space perspective to categorize the state-ofthe-art works into three major categories.Specifically,the works are characterized as using similar techniques such as visual designs in simulation space(e.g,visual mapping,boxplot-based visual summarization,etc.),parameter space analysis(e.g,visual steering,parameter space projection,etc.)and data processing in feature space(e.g,feature definition and extraction,sampling,reduction and clustering of simulation data,etc.).
基金supported by National Natural Science Foundation of China (No.10835009)Chinese Academy of Sciences for the Key Project of Knowledge Innovation Program (No.KJCX3.SYW.N4)Chinese Ministry of Sciences for the 973 project (No.2009GB103000)
文摘A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.
基金supported in parts by National Natural Science Foundation of China(U2001206,61872250)GD Talent Program(2019JC05X328)+2 种基金GD Natural Science Foundation(2020A0505100064,2021B1515020085)DEGP Key Project(2018KZDXM058)Shenzhen Science and Technology Key Program(RCJC20200714114435012,JCYJ20210324120213036).
文摘Appropriate color mapping for categorical data visualization can significantly facilitate the discovery of underlying data patterns and effectively bring out visual aesthetics.Some systems suggest pre-defined palettes for this task.However,a predefined color mapping is not always optimal,failing to consider users’needs for customization.Given an input cate-gorical data visualization and a reference image,we present an effective method to automatically generate a coloring that resembles the reference while allowing classes to be easily distinguished.We extract a color palette with high perceptual distance between the colors by sampling dominant and discriminable colors from the image’s color space.These colors are assigned to given classes by solving an integer quadratic program to optimize point distinctness of the given chart while preserving the color spatial relations in the source image.We show results on various coloring tasks,with a diverse set of new coloring appearances for the input data.We also compare our approach to state-of-the-art palettes in a controlled user study,which shows that our method achieves comparable performance in class discrimination,while being more similar to the source image.User feedback after using our system verifies its efficiency in automatically generating desirable colorings that meet the user’s expectations when choosing a reference.
文摘Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.
基金This work is supported by National Natural Science Foundation of China 61304208by the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property Open Fund Project 20181901CRP04+2 种基金by the Scientific Research Fund of Hunan Province Education Department 18C0003by the Research Project on Teaching Reform in General Colleges and Universities,Hunan Provincial Education Department 20190147by the Hunan Normal University Ungraduated Innovation and Entrepreneurship Training Plan Project 2019127.
文摘Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.
文摘This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data(like big data)into analyzable quantities.It is also used to offer an overview in places where general trends are more important than individual details.Symbolic data comes in many forms like intervals,histograms,categories and modal multi-valued objects.Symbolic data can also be considered as a distribution.Currently,the de facto visualization approach for symbolic data is zoomstars which has many limitations.The biggest limitation is that the default distributions(histograms)are not supported in 2D as additional dimension is required.This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach.In addition,several improvements for categorical and modal variables are proposed for a clearer indication of presented categories.Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal.Furthermore,an alternative approach that allows visualizing the whole data set in comprehensive table-like graph,called shape encoding,is proposed.These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends,similar objects and important features,detecting outliers and discrepancies in the data.
文摘One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suitable climate in the greenhouse became imperative to ensure that the indoor plants are more productive hence the agriculture sector was not left behind. That notwithstanding, the introduction and deployment of IoT technology in agriculture solves many problems and increases crop production. This paper focuses mainly on the deployment of the Internet of Things (IoT) in acquiring real- time data of environmental parameters in the greenhouse. Various IoT technologies that can be applicable in greenhouse monitoring system was presented and in the proposed model, a method is developed to send the air temperature and humidity data obtained by the DHT11 sensor to the cloud using an ESP8266-based NodeMCU and firstly to the cloud platform Thing- Speak, and then to Adafruit.IO in which MQTT protocol was used for the reception of sensor data to the application layer referred as Human-Machine Interface. The system has been completely implemented in an actual prototype, allowing the acquiring of data and the publisher/subscriber concept used for communication. The data is published with a broker’s aid, which is responsible for transferring messages to the intended clients based on topic choice. Lastly, the functionality testing of MQTT was carried out and the results showed that the messages are successfully published.
基金Supported by the National Natural Science Foun-dation of China (60173051) ,the Teaching and Research Award Pro-gramfor Outstanding Young Teachers in Higher Education Institu-tions of Ministry of Education of China ,and Liaoning Province HigherEducation Research Foundation (20040206)
文摘Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.