Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for rese...Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.展开更多
This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,tradit...This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.展开更多
Aviation data analysis can help airlines to understand passenger needs,so as to provide passengers with more sophisticated and better services.How to explore the implicit message and analyze contained features from la...Aviation data analysis can help airlines to understand passenger needs,so as to provide passengers with more sophisticated and better services.How to explore the implicit message and analyze contained features from large amounts of data has become an important issue in the civil aviation passenger data analysis process.The uncertainty analysis and visualization methods of data record and property measurement are offered in this paper,based on the visual analysis and uncertainty measure theory combined with parallel coordinates,radar chart,histogram,pixel chart and good interaction.At the same time,the data source expression clearly shows the uncertainty and hidden information as an information base for passengers’service展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression ana...The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.展开更多
Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision...Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.展开更多
Data breaches have massive consequences for companies, affecting them financially and undermining their reputation, which poses significant challenges to online security and the long-term viability of businesses. This...Data breaches have massive consequences for companies, affecting them financially and undermining their reputation, which poses significant challenges to online security and the long-term viability of businesses. This study analyzes trends in data breaches in the United States, examining the frequency, causes, and magnitude of breaches across various industries. We document that data breaches are increasing, with hacking emerging as the leading cause. Our descriptive analyses explore factors influencing breaches, including security vulnerabilities, human error, and malicious attacks. The findings provide policymakers and businesses with actionable insights to bolster data security through proactive audits, patching, encryption, and response planning. By better understanding breach patterns and risk factors, organizations can take targeted steps to enhance protections and mitigate the potential damage of future incidents.展开更多
Gestational Diabetes Mellitus (GDM) is a significant health concern affecting pregnant women worldwide. It is characterized by elevated blood sugar levels during pregnancy and poses risks to both maternal and fetal he...Gestational Diabetes Mellitus (GDM) is a significant health concern affecting pregnant women worldwide. It is characterized by elevated blood sugar levels during pregnancy and poses risks to both maternal and fetal health. Maternal complications of GDM include an increased risk of developing type 2 diabetes later in life, as well as hypertension and preeclampsia during pregnancy. Fetal complications may include macrosomia (large birth weight), birth injuries, and an increased risk of developing metabolic disorders later in life. Understanding the demographics, risk factors, and biomarkers associated with GDM is crucial for effective management and prevention strategies. This research aims to address these aspects comprehensively through the analysis of a dataset comprising 600 pregnant women. By exploring the demographics of the dataset and employing data modeling techniques, the study seeks to identify key risk factors associated with GDM. Moreover, by analyzing various biomarkers, the research aims to gain insights into the physiological mechanisms underlying GDM and its implications for maternal and fetal health. The significance of this research lies in its potential to inform clinical practice and public health policies related to GDM. By identifying demographic patterns and risk factors, healthcare providers can better tailor screening and intervention strategies for pregnant women at risk of GDM. Additionally, insights into biomarkers associated with GDM may contribute to the development of novel diagnostic tools and therapeutic approaches. Ultimately, by enhancing our understanding of GDM, this research aims to improve maternal and fetal outcomes and reduce the burden of this condition on healthcare systems and society. However, it’s important to acknowledge the limitations of the dataset used in this study. Further research utilizing larger and more diverse datasets, perhaps employing advanced data analysis techniques such as Power BI, is warranted to corroborate and expand upon the findings of this research. This underscores the ongoing need for continued investigation into GDM to refine our understanding and improve clinical management strategies.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of to...Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of topsoil in the peri-urban Beijing. A two-dimensional legend was designed to accompany the visualization-vertical axis (hues) for visualizing the predicted values and horizontal axis (whiteness) for visualizing the prediction error. Moreover, different ways of visualizing uncertainty were briefly reviewed in this paper. This case study indicated that visualization of both predictions and prediction uncertainty offered a possibility to enhance visual exploration of the data uncertainty and to compare different prediction methods or predictions of totally different variables. The whitish region of the visualization map can be simply interpreted as unsatisfactory prediction results, where may need additional samples or more suitable prediction models for a better prediction results.展开更多
The control system of Hefei Light Source II(HLS-Ⅱ) is a distributed system based on the experimental physics and industrial control system(EPICS). It is necessary to maintain the central configuration files for the e...The control system of Hefei Light Source II(HLS-Ⅱ) is a distributed system based on the experimental physics and industrial control system(EPICS). It is necessary to maintain the central configuration files for the existing archiving system. When the process variables in the control system are added, removed, or updated, the configuration files must be manually modified to maintain consistency with the control system. This paper presents a new method for data archiving, which realizes the automatic configuration of the archiving parameters. The system uses microservice architecture to integrate the EPICS Archiver Appliance and Rec Sync. In this way, the system can collect all the archived meta-configuration from the distributed input/output controllers and enter them into the EPICS Archiver Appliance automatically. Furthermore, we also developed a web-based GUI to provide automatic visualization of real-time and historical data. At present,this system is under commissioning at HLS-Ⅱ. The results indicate that the new archiving system is reliable and convenient to operate. The operation mode without maintenance is valuable for large-scale scientific facilities.展开更多
A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in diff...A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.展开更多
With long-term marine surveys and research,and especially with the development of new marine environment monitoring technologies,prodigious amounts of complex marine environmental data are generated,and continuously i...With long-term marine surveys and research,and especially with the development of new marine environment monitoring technologies,prodigious amounts of complex marine environmental data are generated,and continuously increase rapidly.Features of these data include massive volume,widespread distribution,multiple-sources,heterogeneous,multi-dimensional and dynamic in structure and time.The present study recommends an integrative visualization solution for these data,to enhance the visual display of data and data archives,and to develop a joint use of these data distributed among different organizations or communities.This study also analyses the web services technologies and defines the concept of the marine information gird,then focuses on the spatiotemporal visualization method and proposes a process-oriented spatiotemporal visualization method.We discuss how marine environmental data can be organized based on the spatiotemporal visualization method,and how organized data are represented for use with web services and stored in a reusable fashion.In addition,we provide an original visualization architecture that is integrative and based on the explored technologies.In the end,we propose a prototype system of marine environmental data of the South China Sea for visualizations of Argo floats,sea surface temperature fields,sea current fields,salinity,in-situ investigation data,and ocean stations.An integration visualization architecture is illustrated on the prototype system,which highlights the process-oriented temporal visualization method and demonstrates the benefit of the architecture and the methods described in this study.展开更多
Simulation and interpretation of marine controlled-source electromagnetic(CSEM) data often approximate the transmitter source as an ideal horizontal electric dipole(HED) and assume that the receivers are located on a ...Simulation and interpretation of marine controlled-source electromagnetic(CSEM) data often approximate the transmitter source as an ideal horizontal electric dipole(HED) and assume that the receivers are located on a flat seabed.Actually,however,the transmitter dipole source will be rotated,tilted and deviated from the survey profile due to ocean currents.And free-fall receivers may be also rotated to some arbitrary horizontal orientation and located on sloping seafloor.In this paper,we investigate the effects of uncertainties in the transmitter tilt,transmitter rotation and transmitter deviation from the survey profile as well as in the receiver's location and orientation on marine CSEM data.The model study shows that the uncertainties of all position and orientation parameters of both the transmitter and receivers can propagate into observed data uncertainties,but to a different extent.In interpreting marine data,field data uncertainties caused by the position and orientation uncertainties of both the transmitter and receivers need to be taken into account.展开更多
Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been di...Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.展开更多
Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data vis...Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.展开更多
The mathematic theory for uncertainty model of line segment are summed up to achieve a general conception, and the line error hand model of εσ is a basic uncertainty model that can depict the line accuracy and quali...The mathematic theory for uncertainty model of line segment are summed up to achieve a general conception, and the line error hand model of εσ is a basic uncertainty model that can depict the line accuracy and quality efficiently while the model of εm and error entropy can be regarded as the supplement of it. The error band model will reflect and describe the influence of line uncertainty on polygon uncertainty. Therefore, the statistical characteristic of the line error is studied deeply by analyzing the probability that the line error falls into a certain range. Moreover, the theory accordance is achieved in the selecting the error buffer for line feature and the error indicator. The relationship of the accuracy of area for a polygon with the error loop for a polygon boundary is deduced and computed.展开更多
文摘Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.
文摘This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.
文摘Aviation data analysis can help airlines to understand passenger needs,so as to provide passengers with more sophisticated and better services.How to explore the implicit message and analyze contained features from large amounts of data has become an important issue in the civil aviation passenger data analysis process.The uncertainty analysis and visualization methods of data record and property measurement are offered in this paper,based on the visual analysis and uncertainty measure theory combined with parallel coordinates,radar chart,histogram,pixel chart and good interaction.At the same time,the data source expression clearly shows the uncertainty and hidden information as an information base for passengers’service
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
文摘Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.
文摘Data breaches have massive consequences for companies, affecting them financially and undermining their reputation, which poses significant challenges to online security and the long-term viability of businesses. This study analyzes trends in data breaches in the United States, examining the frequency, causes, and magnitude of breaches across various industries. We document that data breaches are increasing, with hacking emerging as the leading cause. Our descriptive analyses explore factors influencing breaches, including security vulnerabilities, human error, and malicious attacks. The findings provide policymakers and businesses with actionable insights to bolster data security through proactive audits, patching, encryption, and response planning. By better understanding breach patterns and risk factors, organizations can take targeted steps to enhance protections and mitigate the potential damage of future incidents.
文摘Gestational Diabetes Mellitus (GDM) is a significant health concern affecting pregnant women worldwide. It is characterized by elevated blood sugar levels during pregnancy and poses risks to both maternal and fetal health. Maternal complications of GDM include an increased risk of developing type 2 diabetes later in life, as well as hypertension and preeclampsia during pregnancy. Fetal complications may include macrosomia (large birth weight), birth injuries, and an increased risk of developing metabolic disorders later in life. Understanding the demographics, risk factors, and biomarkers associated with GDM is crucial for effective management and prevention strategies. This research aims to address these aspects comprehensively through the analysis of a dataset comprising 600 pregnant women. By exploring the demographics of the dataset and employing data modeling techniques, the study seeks to identify key risk factors associated with GDM. Moreover, by analyzing various biomarkers, the research aims to gain insights into the physiological mechanisms underlying GDM and its implications for maternal and fetal health. The significance of this research lies in its potential to inform clinical practice and public health policies related to GDM. By identifying demographic patterns and risk factors, healthcare providers can better tailor screening and intervention strategies for pregnant women at risk of GDM. Additionally, insights into biomarkers associated with GDM may contribute to the development of novel diagnostic tools and therapeutic approaches. Ultimately, by enhancing our understanding of GDM, this research aims to improve maternal and fetal outcomes and reduce the burden of this condition on healthcare systems and society. However, it’s important to acknowledge the limitations of the dataset used in this study. Further research utilizing larger and more diverse datasets, perhaps employing advanced data analysis techniques such as Power BI, is warranted to corroborate and expand upon the findings of this research. This underscores the ongoing need for continued investigation into GDM to refine our understanding and improve clinical management strategies.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
基金Under the auspices of Knowledge Innovation Frontier Project of Institute of Soil Science,Chinese Academy of Sciences(No.ISSASIP0716 )the National Nature Science Foundation of China ( No.40701070,40571065)
文摘Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of topsoil in the peri-urban Beijing. A two-dimensional legend was designed to accompany the visualization-vertical axis (hues) for visualizing the predicted values and horizontal axis (whiteness) for visualizing the prediction error. Moreover, different ways of visualizing uncertainty were briefly reviewed in this paper. This case study indicated that visualization of both predictions and prediction uncertainty offered a possibility to enhance visual exploration of the data uncertainty and to compare different prediction methods or predictions of totally different variables. The whitish region of the visualization map can be simply interpreted as unsatisfactory prediction results, where may need additional samples or more suitable prediction models for a better prediction results.
基金supported by the National Natural Science Foundation of China(No.11375186)
文摘The control system of Hefei Light Source II(HLS-Ⅱ) is a distributed system based on the experimental physics and industrial control system(EPICS). It is necessary to maintain the central configuration files for the existing archiving system. When the process variables in the control system are added, removed, or updated, the configuration files must be manually modified to maintain consistency with the control system. This paper presents a new method for data archiving, which realizes the automatic configuration of the archiving parameters. The system uses microservice architecture to integrate the EPICS Archiver Appliance and Rec Sync. In this way, the system can collect all the archived meta-configuration from the distributed input/output controllers and enter them into the EPICS Archiver Appliance automatically. Furthermore, we also developed a web-based GUI to provide automatic visualization of real-time and historical data. At present,this system is under commissioning at HLS-Ⅱ. The results indicate that the new archiving system is reliable and convenient to operate. The operation mode without maintenance is valuable for large-scale scientific facilities.
基金supported by National Natural Science Foundation of China (No.10835009)Chinese Academy of Sciences for the Key Project of Knowledge Innovation Program (No.KJCX3.SYW.N4)Chinese Ministry of Sciences for the 973 project (No.2009GB103000)
文摘A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.
基金Supported by the Knowledge Innovation Program of the Chinese Academy of Sciences (No.KZCX1-YW-12-04)the National High Technology Research and Development Program of China (863 Program) (Nos.2009AA12Z148,2007AA092202)Support for this study was provided by the Institute of Geographical Sciences and the Natural Resources Research,Chinese Academy of Science (IGSNRR,CAS) and the Institute of Oceanology, CAS
文摘With long-term marine surveys and research,and especially with the development of new marine environment monitoring technologies,prodigious amounts of complex marine environmental data are generated,and continuously increase rapidly.Features of these data include massive volume,widespread distribution,multiple-sources,heterogeneous,multi-dimensional and dynamic in structure and time.The present study recommends an integrative visualization solution for these data,to enhance the visual display of data and data archives,and to develop a joint use of these data distributed among different organizations or communities.This study also analyses the web services technologies and defines the concept of the marine information gird,then focuses on the spatiotemporal visualization method and proposes a process-oriented spatiotemporal visualization method.We discuss how marine environmental data can be organized based on the spatiotemporal visualization method,and how organized data are represented for use with web services and stored in a reusable fashion.In addition,we provide an original visualization architecture that is integrative and based on the explored technologies.In the end,we propose a prototype system of marine environmental data of the South China Sea for visualizations of Argo floats,sea surface temperature fields,sea current fields,salinity,in-situ investigation data,and ocean stations.An integration visualization architecture is illustrated on the prototype system,which highlights the process-oriented temporal visualization method and demonstrates the benefit of the architecture and the methods described in this study.
基金funded by the National Natural Science Foundation of China (41130420)the State High-Tech Development Plan of China (2012AA09A20101)
文摘Simulation and interpretation of marine controlled-source electromagnetic(CSEM) data often approximate the transmitter source as an ideal horizontal electric dipole(HED) and assume that the receivers are located on a flat seabed.Actually,however,the transmitter dipole source will be rotated,tilted and deviated from the survey profile due to ocean currents.And free-fall receivers may be also rotated to some arbitrary horizontal orientation and located on sloping seafloor.In this paper,we investigate the effects of uncertainties in the transmitter tilt,transmitter rotation and transmitter deviation from the survey profile as well as in the receiver's location and orientation on marine CSEM data.The model study shows that the uncertainties of all position and orientation parameters of both the transmitter and receivers can propagate into observed data uncertainties,but to a different extent.In interpreting marine data,field data uncertainties caused by the position and orientation uncertainties of both the transmitter and receivers need to be taken into account.
文摘Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.
基金This research work is supported by Hunan Provincial Education Science 13th Five Year Plan(Grant No.XJK016BXX001)Social Science Foundation of Hunan Province(Grant No.17YBA049)+2 种基金Hunan Provincial Natural Science Foundation of China(Grant No.2017JJ2016)National Students’platform for innovation and entrepreneurship training(Grant No.201811532010)The work is also supported by Open foundation for University Innovation Platform from Hunan Province,China(Grand No.16K013)and the 2011 Collaborative Innovation Center of Big Data for Financial and Economical Asset Development and Utility in Universities of Hunan Province.We also thank the anonymous reviewers for their valuable comments and insightful suggestions.
文摘Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.
基金Project supported by the National Natural Science Foundation of China (No.40301043) .
文摘The mathematic theory for uncertainty model of line segment are summed up to achieve a general conception, and the line error hand model of εσ is a basic uncertainty model that can depict the line accuracy and quality efficiently while the model of εm and error entropy can be regarded as the supplement of it. The error band model will reflect and describe the influence of line uncertainty on polygon uncertainty. Therefore, the statistical characteristic of the line error is studied deeply by analyzing the probability that the line error falls into a certain range. Moreover, the theory accordance is achieved in the selecting the error buffer for line feature and the error indicator. The relationship of the accuracy of area for a polygon with the error loop for a polygon boundary is deduced and computed.