This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,tradit...This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.展开更多
Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
Background:Osteoarthritis of the knee(KOA)is a chronic degenerative disease.KOA is a growing concern due to its high incidence and the pain and other burdens it places on patients.Traditional medicine is a health care...Background:Osteoarthritis of the knee(KOA)is a chronic degenerative disease.KOA is a growing concern due to its high incidence and the pain and other burdens it places on patients.Traditional medicine is a health care model with a long history that includes nature-based treatments,psycho-psychological types of treatments,and more.Traditional medicine is also more effective in diagnosing and treating KOA,and it has never stopped researching KOA.There are no bibliometric studies analyzing articles on the traditional medical diagnosis and management of KOA.This study aimed to comprehensively analyze and analyze the general trends in the study of KOA in traditional medicine from a bibliometric perspective.Methods:All articles reporting on KOA and traditional medicine from 1 January 1990 to 01 November 2022 were obtained from the Web of Science Core.Some software such as CiteSpace,VOS Viewer and Scimago Graphica were used to analyse the publications,which included authors,citations,journals,references,countries where studies were published,institutions and research keywords.The final visualisations were produced using this data.Results:A total of 769 articles were searched.Peijian Tong was identified as the most contributing and published author in the field,and medicine was identified as the most reputable journal in the field of traditional medicine and osteoarthritis of the knee.China is a global leader in the field and a centre of collaboration in the field,with a major concentration of traditional medicine in Asia,which is consistent with the evidence that traditional medicine originated in Asia.According to the data,“osteoarthritis”,“knee osteoarthritis”,“pain”,and“knee”and“hip”were identified as hot keywords for research in this area.Conclusions:The results of this bibliometric study provide a snapshot of the current state of clinical research in the treatment of KOA in traditional medicine and are well placed to envisage future hotspots and possible trends,and may help to provide researchers with more than enough information with a view to guiding the cutting edge of research in this field and the infinite possibilities for the future.展开更多
In this paper, we conduct research on the library development prospects and challenges under the environment of big data and cloud computing. Increasingly nervous and public libraries are facing funding, resources con...In this paper, we conduct research on the library development prospects and challenges under the environment of big data and cloud computing. Increasingly nervous and public libraries are facing funding, resources construction pace slow or stagnant difficult situation, readers to the library cause in the new times challenge. Big data era has quietly come, for the knowledge storage, use and development as own duty, the library, how to improve the ability of handling large amounts of growth literature is urgent. Our methodology solves the issues well which will be meaningful.展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been di...Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.展开更多
In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,...In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,which is based on big data technology.In this study,we take the news report from the website of the Communist Party of China(CPC)Central Commission for Discipline Inspection(CCDI)as the source of data.Firstly,the obtained text data is converted to word segmentation and stop words under preprocessing,and then the pre-processed data is improved by vectorization and text clustering,finally,after text clustering,the key words of clean government work is derived from visualization analysis.According to the results of this study,it shows that China’s clean government work should focus on‘the four forms of decadence’issue,and related departments must strictly crack down five categories of phenomena,such as“illegal payment of subsidies or benefits,illegal delivery of gifts and cash gift,illegal use of official vehicles,banquets using public funds,extravagant wedding ceremonies and funeral”.The results of this study are consistent with the official data released by the CCDI’s website,which also suggests that the method is feasible and effective.展开更多
As a global financial center, the transportation system in New York City (NYC) has always been studied from various aspects. Since 2009, NYC Taxi and Limousine Commission have made public the information on NYC taxi o...As a global financial center, the transportation system in New York City (NYC) has always been studied from various aspects. Since 2009, NYC Taxi and Limousine Commission have made public the information on NYC taxi operations, offering an opportunity for detailed analysis. Thus, this research project investigates taxi operations in New York City based on big data analysis. The correlation between taxi operations and different types of weather, including precipitation, snow depth, and snowfall is discussed in this paper. The research also evaluates taxi trip distribution in each NTA area using Geopandas, and presents its density on an NYC map.展开更多
By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline...By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.展开更多
The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and ...The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and diversified information resources also brought about the chaos, so that the query of the messages is no way to start. In fact, the information resources can provide us with more convenience, but we have to spend a lot of energy to organize and filter the information, and the costs and the time of the investment are immeasurable. Usually, the information we want to query is often easy to understand, and the information design uses the more intuitive and vivid computing means to achieve the visualization of the big data, in order to reflect the beauty of the big data.展开更多
Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data vis...Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.展开更多
In the era of big data,the general public is more likely to access big data,but they wouldn’t like to analyze the data.Therefore,the traditional data visualization with certain professionalism is not easy to be accep...In the era of big data,the general public is more likely to access big data,but they wouldn’t like to analyze the data.Therefore,the traditional data visualization with certain professionalism is not easy to be accepted by the general public living in the fast pace.Under this background,a new general visualization method for dynamic time series data emerges as the times require.Time series data visualization organizes abstract and hard-to-understand data into a form that is easily understood by the public.This method integrates data visualization into short videos,which is more in line with the way people get information in modern fast-paced lifestyles.The modular approach also facilitates public participation in production.This paper summarizes the dynamic visualization methods of time series data ranking,studies the relevant literature,shows its value and existing problems,and gives corresponding suggestions and future research prospects.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
Air pollution caused by fine dust is a big problem all over the world and fine dust has a fatal impact on human health.But there are too few fine dust measuring stations and the installation cost of fine dust measurin...Air pollution caused by fine dust is a big problem all over the world and fine dust has a fatal impact on human health.But there are too few fine dust measuring stations and the installation cost of fine dust measuring station is very expensive.In this paper,we propose Cloud-based air pollution information system using R.To measure fine dust,we have developed an inexpensive measuring device and studied the technique to accurately measure the concentration of fine dust at the user’s location.And we have developed the smartphone application to provide air pollution information.In our system,we provide collected data based analytical results through effective data modeling.Our system provides information on fine dust value and action tips through the air pollution information application.And it supports visualization on the map using the statistical program R.The user can check the fine dust statistics map and cope with fine dust accordingly.展开更多
In this paper,a novel secret data-driven carrier-free(semi structural formula)visual secret sharing(VSS)scheme with(2,2)threshold based on the error correction blocks of QR codes is investigated.The proposed scheme is...In this paper,a novel secret data-driven carrier-free(semi structural formula)visual secret sharing(VSS)scheme with(2,2)threshold based on the error correction blocks of QR codes is investigated.The proposed scheme is to search two QR codes that altered to satisfy the secret sharing modules in the error correction mechanism from the large datasets of QR codes according to the secret image,which is to embed the secret image into QR codes based on carrier-free secret sharing.The size of secret image is the same or closest with the region from the coordinate of(7,7)to the lower right corner of QR codes.In this way,we can find the QR codes combination of embedding secret information maximization with secret data-driven based on Big data search.Each output share is a valid QR code which can be decoded correctly utilizing a QR code reader and it may reduce the likelihood of attracting the attention of potential attackers.The proposed scheme can reveal secret image visually with the abilities of stacking and XOR decryptions.The secret image can be recovered by human visual system(HVS)without any computation based on stacking.On the other hand,if the light-weight computation device is available,the secret image can be lossless revealed based on XOR operation.In addition,QR codes could assist alignment for VSS recovery.The experimental results show the effectiveness of our scheme.展开更多
Effective management of daily road traffic is a huge challenge for traffic personnel.Urban traffic management has come a long way from manual control to artificial intelligence techniques.Still real-time adaptive traf...Effective management of daily road traffic is a huge challenge for traffic personnel.Urban traffic management has come a long way from manual control to artificial intelligence techniques.Still real-time adaptive traffic control is an unfulfilled dream due to lack of low cost and easy to install traffic sensor with real-time communication capability.With increasing number of on-board Bluetooth devices in new generation automobiles,these devices can act as sensors to convey the traffic information indirectly.This paper presents the efficacy of road-side Bluetooth scanners for traffic data collection and big-data analytics to process the collected data to extract traffic parameters.Extracted information and analysis are presented through visualizations and tables.All data analytics and visualizations are carried out off-line in R Studio environment.Reliability aspects of the collected and processed data are also investigated.Higher speed of traffic in one direction owing to the geometry of the road is also established through data analysis.Increased penetration of smart phones and fitness bands in day to day use is also established through the device type of the data collected.The results of this work can be used for regular data collection compared to the traditional road surveys carried out annually or bi-annually.It is also found that compared to previous studies published in the literature,the device penetration rate and sample size found in this study are quite high and very encouraging.This is a novel work in literature,which would be quite useful for effective road traffic management in future.展开更多
With the rapid development of computer hardware and big data processing technology,the bottleneck of intelligent analysis of massive data has changed from“how to deal with massive data quickly”to“how to mine valuab...With the rapid development of computer hardware and big data processing technology,the bottleneck of intelligent analysis of massive data has changed from“how to deal with massive data quickly”to“how to mine valuable information quickly and effectively from massive data”.Visualization and visualization analysis based on human visual perception characteristics,combined with data analysis and human-computer interaction and other technologies,use visual charts to deconstruct the knowledge and rules contained in complex data.This technology runs through the whole life cycle of data science,known as the last kilometer in the field of big data intelligence,and has achieved remarkable results in many big data application analysis scenarios.Traditional visualization analysis is extremely dependent on the user’s frequent active participation in the whole life cycle of visualization analysis,including data preparation,data conversion,visualization mapping,visual rendering,user interaction,visual analysis and other stages,which require high professional skills of users and low intelligence of the system.Therefore,the traditional visualization analysis mode and systems have the challenges of high threshold of visualization analysis,high cost of data preparation,high latency of interaction response,and low efficiency of interaction mode.Therefore,this paper introduces the application,challenges in visualization based on explainable AI.展开更多
The development of intelligent and personalized recommendation service has become the trend in the era with rapid development of digital library and the popularization of intelligent technology.However,the traditional...The development of intelligent and personalized recommendation service has become the trend in the era with rapid development of digital library and the popularization of intelligent technology.However,the traditional personalized service cannot meet the needs of users and the demands of digital library’s development.Furthermore,the needs of users are becoming morecomplicated.How to accurately describe and fully understand the user's complex personalized requirements,and how to make the recommendation of resource services according to their needs have become a difficult problem.on the other hand,because the resources of digital libraries are huge,it’s the focus that how to effectively collect massive resources and support efficient retrieval and recommendations,as well as how to fully exploit the intrinsic semantic links of digital library resources.In this regard,this review aim to analyze the construction of library resources informatization in the era of big data and expounds the important significance of constructing digital libraries.This paper alsooptimizes the services of the wisdom libraries after analyzing informatization library resources from the aspects of image resources,service strategies of MOOC and reading changes.Finally,it studies the approaches of realizing the service strategies of library resources which can be used for reference.展开更多
文摘This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘Background:Osteoarthritis of the knee(KOA)is a chronic degenerative disease.KOA is a growing concern due to its high incidence and the pain and other burdens it places on patients.Traditional medicine is a health care model with a long history that includes nature-based treatments,psycho-psychological types of treatments,and more.Traditional medicine is also more effective in diagnosing and treating KOA,and it has never stopped researching KOA.There are no bibliometric studies analyzing articles on the traditional medical diagnosis and management of KOA.This study aimed to comprehensively analyze and analyze the general trends in the study of KOA in traditional medicine from a bibliometric perspective.Methods:All articles reporting on KOA and traditional medicine from 1 January 1990 to 01 November 2022 were obtained from the Web of Science Core.Some software such as CiteSpace,VOS Viewer and Scimago Graphica were used to analyse the publications,which included authors,citations,journals,references,countries where studies were published,institutions and research keywords.The final visualisations were produced using this data.Results:A total of 769 articles were searched.Peijian Tong was identified as the most contributing and published author in the field,and medicine was identified as the most reputable journal in the field of traditional medicine and osteoarthritis of the knee.China is a global leader in the field and a centre of collaboration in the field,with a major concentration of traditional medicine in Asia,which is consistent with the evidence that traditional medicine originated in Asia.According to the data,“osteoarthritis”,“knee osteoarthritis”,“pain”,and“knee”and“hip”were identified as hot keywords for research in this area.Conclusions:The results of this bibliometric study provide a snapshot of the current state of clinical research in the treatment of KOA in traditional medicine and are well placed to envisage future hotspots and possible trends,and may help to provide researchers with more than enough information with a view to guiding the cutting edge of research in this field and the infinite possibilities for the future.
文摘In this paper, we conduct research on the library development prospects and challenges under the environment of big data and cloud computing. Increasingly nervous and public libraries are facing funding, resources construction pace slow or stagnant difficult situation, readers to the library cause in the new times challenge. Big data era has quietly come, for the knowledge storage, use and development as own duty, the library, how to improve the ability of handling large amounts of growth literature is urgent. Our methodology solves the issues well which will be meaningful.
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.
基金funded by the Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ2016+2 种基金2016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269Accurate crawler design and implementation with a data cleaning function,National Students innovation and entrepreneurship of training program,grant number 201811532010This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open project,grant number 20181901CRP03,20181901CRP04,20181901CRP05.
文摘In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,which is based on big data technology.In this study,we take the news report from the website of the Communist Party of China(CPC)Central Commission for Discipline Inspection(CCDI)as the source of data.Firstly,the obtained text data is converted to word segmentation and stop words under preprocessing,and then the pre-processed data is improved by vectorization and text clustering,finally,after text clustering,the key words of clean government work is derived from visualization analysis.According to the results of this study,it shows that China’s clean government work should focus on‘the four forms of decadence’issue,and related departments must strictly crack down five categories of phenomena,such as“illegal payment of subsidies or benefits,illegal delivery of gifts and cash gift,illegal use of official vehicles,banquets using public funds,extravagant wedding ceremonies and funeral”.The results of this study are consistent with the official data released by the CCDI’s website,which also suggests that the method is feasible and effective.
文摘As a global financial center, the transportation system in New York City (NYC) has always been studied from various aspects. Since 2009, NYC Taxi and Limousine Commission have made public the information on NYC taxi operations, offering an opportunity for detailed analysis. Thus, this research project investigates taxi operations in New York City based on big data analysis. The correlation between taxi operations and different types of weather, including precipitation, snow depth, and snowfall is discussed in this paper. The research also evaluates taxi trip distribution in each NTA area using Geopandas, and presents its density on an NYC map.
文摘By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.
文摘The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and diversified information resources also brought about the chaos, so that the query of the messages is no way to start. In fact, the information resources can provide us with more convenience, but we have to spend a lot of energy to organize and filter the information, and the costs and the time of the investment are immeasurable. Usually, the information we want to query is often easy to understand, and the information design uses the more intuitive and vivid computing means to achieve the visualization of the big data, in order to reflect the beauty of the big data.
基金This research work is supported by Hunan Provincial Education Science 13th Five Year Plan(Grant No.XJK016BXX001)Social Science Foundation of Hunan Province(Grant No.17YBA049)+2 种基金Hunan Provincial Natural Science Foundation of China(Grant No.2017JJ2016)National Students’platform for innovation and entrepreneurship training(Grant No.201811532010)The work is also supported by Open foundation for University Innovation Platform from Hunan Province,China(Grand No.16K013)and the 2011 Collaborative Innovation Center of Big Data for Financial and Economical Asset Development and Utility in Universities of Hunan Province.We also thank the anonymous reviewers for their valuable comments and insightful suggestions.
文摘Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.
基金This research is funded by the Open Foundation for the University Innovation Platform in the Hunan Province,Grant No.18K103Hunan Provincial Natural Science Foundation of China,Grant No.2017JJ20162016 Science Research Project of Hunan Provincial Department of Education,Grant No.16C0269.This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open project,Grant Nos.20181901CRP03,20181901CRP04,20181901CRP05 National Social Science Fund Project:Research on the Impact Mechanism of China’s Capital Space Flow on Regional Economic Development(Project No.14BJL086).
文摘In the era of big data,the general public is more likely to access big data,but they wouldn’t like to analyze the data.Therefore,the traditional data visualization with certain professionalism is not easy to be accepted by the general public living in the fast pace.Under this background,a new general visualization method for dynamic time series data emerges as the times require.Time series data visualization organizes abstract and hard-to-understand data into a form that is easily understood by the public.This method integrates data visualization into short videos,which is more in line with the way people get information in modern fast-paced lifestyles.The modular approach also facilitates public participation in production.This paper summarizes the dynamic visualization methods of time series data ranking,studies the relevant literature,shows its value and existing problems,and gives corresponding suggestions and future research prospects.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
文摘Air pollution caused by fine dust is a big problem all over the world and fine dust has a fatal impact on human health.But there are too few fine dust measuring stations and the installation cost of fine dust measuring station is very expensive.In this paper,we propose Cloud-based air pollution information system using R.To measure fine dust,we have developed an inexpensive measuring device and studied the technique to accurately measure the concentration of fine dust at the user’s location.And we have developed the smartphone application to provide air pollution information.In our system,we provide collected data based analytical results through effective data modeling.Our system provides information on fine dust value and action tips through the air pollution information application.And it supports visualization on the map using the statistical program R.The user can check the fine dust statistics map and cope with fine dust accordingly.
文摘In this paper,a novel secret data-driven carrier-free(semi structural formula)visual secret sharing(VSS)scheme with(2,2)threshold based on the error correction blocks of QR codes is investigated.The proposed scheme is to search two QR codes that altered to satisfy the secret sharing modules in the error correction mechanism from the large datasets of QR codes according to the secret image,which is to embed the secret image into QR codes based on carrier-free secret sharing.The size of secret image is the same or closest with the region from the coordinate of(7,7)to the lower right corner of QR codes.In this way,we can find the QR codes combination of embedding secret information maximization with secret data-driven based on Big data search.Each output share is a valid QR code which can be decoded correctly utilizing a QR code reader and it may reduce the likelihood of attracting the attention of potential attackers.The proposed scheme can reveal secret image visually with the abilities of stacking and XOR decryptions.The secret image can be recovered by human visual system(HVS)without any computation based on stacking.On the other hand,if the light-weight computation device is available,the secret image can be lossless revealed based on XOR operation.In addition,QR codes could assist alignment for VSS recovery.The experimental results show the effectiveness of our scheme.
文摘Effective management of daily road traffic is a huge challenge for traffic personnel.Urban traffic management has come a long way from manual control to artificial intelligence techniques.Still real-time adaptive traffic control is an unfulfilled dream due to lack of low cost and easy to install traffic sensor with real-time communication capability.With increasing number of on-board Bluetooth devices in new generation automobiles,these devices can act as sensors to convey the traffic information indirectly.This paper presents the efficacy of road-side Bluetooth scanners for traffic data collection and big-data analytics to process the collected data to extract traffic parameters.Extracted information and analysis are presented through visualizations and tables.All data analytics and visualizations are carried out off-line in R Studio environment.Reliability aspects of the collected and processed data are also investigated.Higher speed of traffic in one direction owing to the geometry of the road is also established through data analysis.Increased penetration of smart phones and fitness bands in day to day use is also established through the device type of the data collected.The results of this work can be used for regular data collection compared to the traditional road surveys carried out annually or bi-annually.It is also found that compared to previous studies published in the literature,the device penetration rate and sample size found in this study are quite high and very encouraging.This is a novel work in literature,which would be quite useful for effective road traffic management in future.
文摘With the rapid development of computer hardware and big data processing technology,the bottleneck of intelligent analysis of massive data has changed from“how to deal with massive data quickly”to“how to mine valuable information quickly and effectively from massive data”.Visualization and visualization analysis based on human visual perception characteristics,combined with data analysis and human-computer interaction and other technologies,use visual charts to deconstruct the knowledge and rules contained in complex data.This technology runs through the whole life cycle of data science,known as the last kilometer in the field of big data intelligence,and has achieved remarkable results in many big data application analysis scenarios.Traditional visualization analysis is extremely dependent on the user’s frequent active participation in the whole life cycle of visualization analysis,including data preparation,data conversion,visualization mapping,visual rendering,user interaction,visual analysis and other stages,which require high professional skills of users and low intelligence of the system.Therefore,the traditional visualization analysis mode and systems have the challenges of high threshold of visualization analysis,high cost of data preparation,high latency of interaction response,and low efficiency of interaction mode.Therefore,this paper introduces the application,challenges in visualization based on explainable AI.
文摘The development of intelligent and personalized recommendation service has become the trend in the era with rapid development of digital library and the popularization of intelligent technology.However,the traditional personalized service cannot meet the needs of users and the demands of digital library’s development.Furthermore,the needs of users are becoming morecomplicated.How to accurately describe and fully understand the user's complex personalized requirements,and how to make the recommendation of resource services according to their needs have become a difficult problem.on the other hand,because the resources of digital libraries are huge,it’s the focus that how to effectively collect massive resources and support efficient retrieval and recommendations,as well as how to fully exploit the intrinsic semantic links of digital library resources.In this regard,this review aim to analyze the construction of library resources informatization in the era of big data and expounds the important significance of constructing digital libraries.This paper alsooptimizes the services of the wisdom libraries after analyzing informatization library resources from the aspects of image resources,service strategies of MOOC and reading changes.Finally,it studies the approaches of realizing the service strategies of library resources which can be used for reference.