The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
Effective identification of traffic accident-prone points can reduce accident risks and eliminate safety hazards.This paper first systematically compares the research in Chinese and foreign literature,and proposes thr...Effective identification of traffic accident-prone points can reduce accident risks and eliminate safety hazards.This paper first systematically compares the research in Chinese and foreign literature,and proposes three types of identification indicators,namely absolute,relative and comprehensive,according to different reference standards.According to the evaluation indicators and modelling methods,the current status of research and problems in identification theory and methods are systematically summarised in terms of mathematical statistics,cluster analysis,machine learning and conflict technology.The study shows that the foreign literature focuses on the innovation of data and indicators and changes from accident point safety management to road network safety management,while the research in Chinese literature focuses on the integration of multiple identification methods and theoretical innovation.Driven by big data,the identification of traffic accident-prone points has been further developed at the meso-micro scale.Morphological image processing methods are widely used,combined with GIS platforms,to accurately mine the spatial attributes and correlations of accidents.Also,considering the spatial and temporal distribution of accidents,the identification results are also transformed from regions to specific road sections and points to achieve more accurate identification.展开更多
Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable ...Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.展开更多
Due to the huge amount of increasing data, the requirements of people forelectronic products such as mobile phones, tablets, and notebooks are constantlyimproving. The development and design of various software applic...Due to the huge amount of increasing data, the requirements of people forelectronic products such as mobile phones, tablets, and notebooks are constantlyimproving. The development and design of various software applications attach greatimportance to users’ experiences. The rationalized UI design should allow a user not onlyenjoy the visual design experience of the new product but also operating it morepleasingly. This process is to enhance the attractiveness and performance of the newproduct and thus to promote the active usage and consuming conduct of users. In thispaper, an UI design optimization strategy for general APP in the big data environment isproposed to get better user experience while effectively obtaining information. Anexperimental example of a library APP is designed to optimize the user experience. Theexperimental results show that the user-centered UI design is the core of optimization,and user portrait based on big data platforms is the key to UI design.展开更多
In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,...In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,which is based on big data technology.In this study,we take the news report from the website of the Communist Party of China(CPC)Central Commission for Discipline Inspection(CCDI)as the source of data.Firstly,the obtained text data is converted to word segmentation and stop words under preprocessing,and then the pre-processed data is improved by vectorization and text clustering,finally,after text clustering,the key words of clean government work is derived from visualization analysis.According to the results of this study,it shows that China’s clean government work should focus on‘the four forms of decadence’issue,and related departments must strictly crack down five categories of phenomena,such as“illegal payment of subsidies or benefits,illegal delivery of gifts and cash gift,illegal use of official vehicles,banquets using public funds,extravagant wedding ceremonies and funeral”.The results of this study are consistent with the official data released by the CCDI’s website,which also suggests that the method is feasible and effective.展开更多
The development of environmental information governance includes three phases: providing for oneself,information disclosure,and public service. And then China is in the transition and transformation of environmental i...The development of environmental information governance includes three phases: providing for oneself,information disclosure,and public service. And then China is in the transition and transformation of environmental information disclosure to the environmental information public service. The core of the transformation is public participation,in the whole procedure of environmental information supply decision making,production,and quality supervision and evaluation,etc. The target path of the environmental information governance reform includes five parts: improvement of public satisfaction,optimizing information disclosure,information quality control,integration of information resources,and multiple supply.展开更多
The development of big data has brought unprecedented challenges and opportunities to the teaching reform of higher education.Property insurance course is the core course of economics and management,and it is the guar...The development of big data has brought unprecedented challenges and opportunities to the teaching reform of higher education.Property insurance course is the core course of economics and management,and it is the guarantee for the supply of talents in the health financial market.Big data technology and data economy put forward innovative requirements for its teaching objectives,teaching content,and teaching system.In China’s new round of double-first-class universities and disciplines,big data is an important foundation and driving force.The comprehensive integration of property insurance and big data is reflected in:Cultivate students’big data thinking;Cultivate students’practical application ability based on market employment needs;Build a new discipline system of applied economics,and achieve good coordination between property insurance courses and other disciplines;The government,enterprises and universities form a strategic partnership to jointly participate in the development and construction of courses;The formulation of government policies can have a better governance effect on the development of higher education and talent training.展开更多
In this paper, we aim at researching status and finding the research fronts in the field of Big Data in recent 15 years. This study applies CiteSpace software to the visualization analysis about the literature data fr...In this paper, we aim at researching status and finding the research fronts in the field of Big Data in recent 15 years. This study applies CiteSpace software to the visualization analysis about the literature data from the year 2000 to 2014 in Web of Science, and to carry out preliminary discussion on the research hotspot and the fronts of Big Data. It concluded that Big Data have been exerting so great influence on the world that increasing researchers from different countries and institutes have continuously researched it in recent years.展开更多
The aim of this article is to synthetically describe the research projects that a selection of Italian univer- sities is undertaking in the context of big data. Far from being exhaustive, this article has the objectiv...The aim of this article is to synthetically describe the research projects that a selection of Italian univer- sities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.展开更多
Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns.The quantity and variety of computer data are growing exponentially for many reasons.For example,retail...Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns.The quantity and variety of computer data are growing exponentially for many reasons.For example,retailers are building vast databases of customer sales activity.Organizations are working on logistics financial services,and public social media are sharing a vast quantity of sentiments related to sales price and products.Challenges of big data include volume and variety in both structured and unstructured data.In this paper,we implemented several machine learning models through Spark MLlib using PySpark,which is scalable,fast,easily integrated with other tools,and has better performance than the traditional models.We studied the stocks of 10 top companies,whose data include historical stock prices,with MLlib models such as linear regression,generalized linear regression,random forest,and decision tree.We implemented naive Bayes and logistic regression classification models.Experimental results suggest that linear regression,random forest,and generalized linear regression provide an accuracy of 80%-98%.The experimental results of the decision tree did not well predict share price movements in the stock market.展开更多
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
基金supported by The Fundamental Research Funds for the Central Universities(No:2022RC023).
文摘Effective identification of traffic accident-prone points can reduce accident risks and eliminate safety hazards.This paper first systematically compares the research in Chinese and foreign literature,and proposes three types of identification indicators,namely absolute,relative and comprehensive,according to different reference standards.According to the evaluation indicators and modelling methods,the current status of research and problems in identification theory and methods are systematically summarised in terms of mathematical statistics,cluster analysis,machine learning and conflict technology.The study shows that the foreign literature focuses on the innovation of data and indicators and changes from accident point safety management to road network safety management,while the research in Chinese literature focuses on the integration of multiple identification methods and theoretical innovation.Driven by big data,the identification of traffic accident-prone points has been further developed at the meso-micro scale.Morphological image processing methods are widely used,combined with GIS platforms,to accurately mine the spatial attributes and correlations of accidents.Also,considering the spatial and temporal distribution of accidents,the identification results are also transformed from regions to specific road sections and points to achieve more accurate identification.
文摘Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.
基金Hunan Provincial Education Science 13th Five-Year Plan (Grant No.XJK016BXX001)Social Science Foundation of Hunan Province (Grant No.17YBA049)+1 种基金Open Foundation for the University Innovation Platform in the HunanProvince, grant number 16K013. This research work is implemented at the 2011Collaborative Innovation Center for Development and Utilization of Finance andEconomics Big Data Property, Universities of Hunan Province. Open project (Grant Nos.20181901CRP03, 20181901CRP04, 20181901CRP05)National Social Science Fund Project: Research on the Impact Mechanism of China’sCapital Space Flow on Regional Economic Development (Project No. 14BJL086).
文摘Due to the huge amount of increasing data, the requirements of people forelectronic products such as mobile phones, tablets, and notebooks are constantlyimproving. The development and design of various software applications attach greatimportance to users’ experiences. The rationalized UI design should allow a user not onlyenjoy the visual design experience of the new product but also operating it morepleasingly. This process is to enhance the attractiveness and performance of the newproduct and thus to promote the active usage and consuming conduct of users. In thispaper, an UI design optimization strategy for general APP in the big data environment isproposed to get better user experience while effectively obtaining information. Anexperimental example of a library APP is designed to optimize the user experience. Theexperimental results show that the user-centered UI design is the core of optimization,and user portrait based on big data platforms is the key to UI design.
基金funded by the Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ2016+2 种基金2016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269Accurate crawler design and implementation with a data cleaning function,National Students innovation and entrepreneurship of training program,grant number 201811532010This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open project,grant number 20181901CRP03,20181901CRP04,20181901CRP05.
文摘In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,which is based on big data technology.In this study,we take the news report from the website of the Communist Party of China(CPC)Central Commission for Discipline Inspection(CCDI)as the source of data.Firstly,the obtained text data is converted to word segmentation and stop words under preprocessing,and then the pre-processed data is improved by vectorization and text clustering,finally,after text clustering,the key words of clean government work is derived from visualization analysis.According to the results of this study,it shows that China’s clean government work should focus on‘the four forms of decadence’issue,and related departments must strictly crack down five categories of phenomena,such as“illegal payment of subsidies or benefits,illegal delivery of gifts and cash gift,illegal use of official vehicles,banquets using public funds,extravagant wedding ceremonies and funeral”.The results of this study are consistent with the official data released by the CCDI’s website,which also suggests that the method is feasible and effective.
文摘The development of environmental information governance includes three phases: providing for oneself,information disclosure,and public service. And then China is in the transition and transformation of environmental information disclosure to the environmental information public service. The core of the transformation is public participation,in the whole procedure of environmental information supply decision making,production,and quality supervision and evaluation,etc. The target path of the environmental information governance reform includes five parts: improvement of public satisfaction,optimizing information disclosure,information quality control,integration of information resources,and multiple supply.
基金This paper is based on a research project financially supported by Guizhou University of Finance and Economics Teaching Quality and Teaching Reform Project(2019)entitled“Research on Teaching Reform of Property Insurance Courses under the Background of Big Data(2019JGZZC07)”+1 种基金supported by“Research on Legal Risks of Multinational Financial Leasing:Based on the‘One Belt One Road’Initiative(HB19FX022)”financially supported by“Research on Cultivation of Big Data Thinking and Application Ability of University Undergraduates:Based on the Perspective of Digital Economy(GZJG20200203)”.
文摘The development of big data has brought unprecedented challenges and opportunities to the teaching reform of higher education.Property insurance course is the core course of economics and management,and it is the guarantee for the supply of talents in the health financial market.Big data technology and data economy put forward innovative requirements for its teaching objectives,teaching content,and teaching system.In China’s new round of double-first-class universities and disciplines,big data is an important foundation and driving force.The comprehensive integration of property insurance and big data is reflected in:Cultivate students’big data thinking;Cultivate students’practical application ability based on market employment needs;Build a new discipline system of applied economics,and achieve good coordination between property insurance courses and other disciplines;The government,enterprises and universities form a strategic partnership to jointly participate in the development and construction of courses;The formulation of government policies can have a better governance effect on the development of higher education and talent training.
文摘In this paper, we aim at researching status and finding the research fronts in the field of Big Data in recent 15 years. This study applies CiteSpace software to the visualization analysis about the literature data from the year 2000 to 2014 in Web of Science, and to carry out preliminary discussion on the research hotspot and the fronts of Big Data. It concluded that Big Data have been exerting so great influence on the world that increasing researchers from different countries and institutes have continuously researched it in recent years.
文摘The aim of this article is to synthetically describe the research projects that a selection of Italian univer- sities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.
文摘Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns.The quantity and variety of computer data are growing exponentially for many reasons.For example,retailers are building vast databases of customer sales activity.Organizations are working on logistics financial services,and public social media are sharing a vast quantity of sentiments related to sales price and products.Challenges of big data include volume and variety in both structured and unstructured data.In this paper,we implemented several machine learning models through Spark MLlib using PySpark,which is scalable,fast,easily integrated with other tools,and has better performance than the traditional models.We studied the stocks of 10 top companies,whose data include historical stock prices,with MLlib models such as linear regression,generalized linear regression,random forest,and decision tree.We implemented naive Bayes and logistic regression classification models.Experimental results suggest that linear regression,random forest,and generalized linear regression provide an accuracy of 80%-98%.The experimental results of the decision tree did not well predict share price movements in the stock market.