The integration, analysis and visualization of the big omics data are critical for addressing a broad spectrum of biological questions. One of the most frequently conducted procedures is enrichment analysis, which sta...The integration, analysis and visualization of the big omics data are critical for addressing a broad spectrum of biological questions. One of the most frequently conducted procedures is enrichment analysis, which statistically tests whether individual functional an- notations of Gent Ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) are significantly over-or under-represented in an "interesting" gene or protein list against the reference set (Tavazoie et al., 1999).展开更多
Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two comp...Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.展开更多
In 2012, China National Ministry of Education issued a new undergraduate course catalog, economic statistics to be classified as a second level discipline of applied economics. However, what specific content should be...In 2012, China National Ministry of Education issued a new undergraduate course catalog, economic statistics to be classified as a second level discipline of applied economics. However, what specific content should be included in the second level discipline has become a very important issue. What should be taught to students or how can they adapt to the needs of the social market aroused a wide attention. For this case, the National Ministry of Education has given a clearly provision for economic statistics' core courses, but whether these main courses can reflect the actual needs and characteristics of economic statistics or not still needs some considerations and discussions. This article started from the angle of the studies on China and the U.S. Journals concerning economic statistics. Using text mining by R, a recent mainstream statistical analysis software, gives a comparative analysis on the contents of economic statistics for recent decades. We created wordcloud about the contents of core statistical journals by R which can help us visually examine the course of economic statistics discipline development for the comparative study. Besides, we drew a conclusion that there were significant differences between China and America's economic statistics, the main difference is the United States pay more attention to the exploration of new methods and be able to adapt to market demand, the development of China's economic statistics are still more traditional, it need a better understanding of the multi-disciplinary knowledge of education, such as Bayesian and dynamics. Another is that the curriculum of China's economic statistics are corrected for China's actual situation, a new training program pay more attention to students' practical ability and social practice. There are a lot of practical problems remain untouched or unsolved which need efforts of decades probably.展开更多
With the development of cities and the explosion of infonnation,vast amounts of geo-tagged textural data about Points of Interests(POIs)have been generated.Extracting useful information and discovering text spatial di...With the development of cities and the explosion of infonnation,vast amounts of geo-tagged textural data about Points of Interests(POIs)have been generated.Extracting useful information and discovering text spatial distributions from the data are challenging and meaningful.Also,the huge numbers of POIs in modem cities make it important to have efficient approaches to retrieve and choose a destination.This paper provides a visual design combing metro map and wordles to meet the needs.In this visualization,metro lines serve as the divider lines splitting the city into several subareas and the boundaries to constrain wordles within each subarea.The wordles are generated from keywords extracted from the text about POIs(including reviews,descriptions,etc.)and embedded into the subareas based on their geographical locations.By generating intuitive results and providing an interactive visualization to support exploring text distribution patterns,our strategy can guide the users to explore urban spatial characteristics and retrieve a location efficiently.Finally,we implement a visual analysis of the restaurants data in Shanghai,China as a case study to evaluate our strategy.展开更多
基金supported by the Special Project on Precision Medicine under the National Key R&D Program (2017YFC0906600)the Natural Science Foundation of China (No. 31671360)
文摘The integration, analysis and visualization of the big omics data are critical for addressing a broad spectrum of biological questions. One of the most frequently conducted procedures is enrichment analysis, which statistically tests whether individual functional an- notations of Gent Ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) are significantly over-or under-represented in an "interesting" gene or protein list against the reference set (Tavazoie et al., 1999).
文摘Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.
文摘In 2012, China National Ministry of Education issued a new undergraduate course catalog, economic statistics to be classified as a second level discipline of applied economics. However, what specific content should be included in the second level discipline has become a very important issue. What should be taught to students or how can they adapt to the needs of the social market aroused a wide attention. For this case, the National Ministry of Education has given a clearly provision for economic statistics' core courses, but whether these main courses can reflect the actual needs and characteristics of economic statistics or not still needs some considerations and discussions. This article started from the angle of the studies on China and the U.S. Journals concerning economic statistics. Using text mining by R, a recent mainstream statistical analysis software, gives a comparative analysis on the contents of economic statistics for recent decades. We created wordcloud about the contents of core statistical journals by R which can help us visually examine the course of economic statistics discipline development for the comparative study. Besides, we drew a conclusion that there were significant differences between China and America's economic statistics, the main difference is the United States pay more attention to the exploration of new methods and be able to adapt to market demand, the development of China's economic statistics are still more traditional, it need a better understanding of the multi-disciplinary knowledge of education, such as Bayesian and dynamics. Another is that the curriculum of China's economic statistics are corrected for China's actual situation, a new training program pay more attention to students' practical ability and social practice. There are a lot of practical problems remain untouched or unsolved which need efforts of decades probably.
基金This work is supported by National Key Research and Development Program of China(Grant No.2017YFB0701900,2016QY02D0304)National Nature Science Foundation of China(Grant No.61100053,61572318,61772336,61672055)。
文摘With the development of cities and the explosion of infonnation,vast amounts of geo-tagged textural data about Points of Interests(POIs)have been generated.Extracting useful information and discovering text spatial distributions from the data are challenging and meaningful.Also,the huge numbers of POIs in modem cities make it important to have efficient approaches to retrieve and choose a destination.This paper provides a visual design combing metro map and wordles to meet the needs.In this visualization,metro lines serve as the divider lines splitting the city into several subareas and the boundaries to constrain wordles within each subarea.The wordles are generated from keywords extracted from the text about POIs(including reviews,descriptions,etc.)and embedded into the subareas based on their geographical locations.By generating intuitive results and providing an interactive visualization to support exploring text distribution patterns,our strategy can guide the users to explore urban spatial characteristics and retrieve a location efficiently.Finally,we implement a visual analysis of the restaurants data in Shanghai,China as a case study to evaluate our strategy.