The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
This study aims to investigate the influence of social media on college choice among undergraduates majoring in Big Data Management and Application in China.The study attempts to reveal how information on social media...This study aims to investigate the influence of social media on college choice among undergraduates majoring in Big Data Management and Application in China.The study attempts to reveal how information on social media platforms such as Weibo,WeChat,and Zhihu influences the cognition and choice process of prospective students.By employing an online quantitative survey questionnaire,data were collected from the 2022 and 2023 classes of new students majoring in Big Data Management and Application at Guilin University of Electronic Technology.The aim was to evaluate the role of social media in their college choice process and understand the features and information that most attract prospective students.Social media has become a key factor influencing the college choice decision-making of undergraduates majoring in Big Data Management and Application in China.Students tend to obtain school information through social media platforms and use this information as an important reference in their decision-making process.Higher education institutions should strengthen their social media information dissemination,providing accurate,timely,and attractive information.It is also necessary to ensure effective management of social media platforms,maintain a positive reputation for the school on social media,and increase the interest and trust of prospective students.Simultaneously,educational decision-makers should consider incorporating social media analysis into their recruitment strategies to better attract new student enrollment.This study provides a new perspective for understanding higher education choice behavior in the digital age,particularly by revealing the importance of social media in the educational decision-making process.This has important practical and theoretical implications for higher education institutions,policymakers,and social media platform operators.展开更多
Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficie...Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.展开更多
Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient te...Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it.Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset.Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool.With this motivation,this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics(MR-HDBCC).The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data.In addition,the MR-HDBCC technique involves three distinct processes namely pre-processing,clustering,and classification.The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise(DBSCAN)techni-que which is capable of detecting random shapes and diverse clusters with noisy data.For improving the performance of the DBSCAN technique,a hybrid model using cockroach swarm optimization(CSO)algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering.Finally,bidirectional gated recurrent neural network(BGRNN)is employed for the classification of big data.The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.展开更多
Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university...Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.展开更多
The aim of this paper is to present a distributed algorithm for big data classification, and its application for Magnetic Resonance Images (MRI) segmentation. We choose the well-known classification method which is th...The aim of this paper is to present a distributed algorithm for big data classification, and its application for Magnetic Resonance Images (MRI) segmentation. We choose the well-known classification method which is the c-means method. The proposed method is introduced in order to perform a cognitive program which is assigned to be implemented on a parallel and distributed machine based on mobile agents. The main idea of the proposed algorithm is to execute the c-means classification procedure by the Mobile Classification Agents (Team Workers) on different nodes on their data at the same time and provide the results to their Mobile Host Agent (Team Leader) which computes the global results and orchestrates the classification until the convergence condition is achieved and the output segmented images will be provided from the Mobile Classification Agents. The data in our case are the big data MRI image of size (m × n) which is splitted into (m × n) elementary images one per mobile classification agent to perform the classification procedure. The experimental results show that the use of the distributed architecture improves significantly the big data segmentation efficiency.展开更多
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initial...To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.展开更多
In recent years,China has paid more and more attention to the development of marine economy and the management and protection of fishery resources.The management departments at all levels regulate and manage the fishi...In recent years,China has paid more and more attention to the development of marine economy and the management and protection of fishery resources.The management departments at all levels regulate and manage the fishing behavior of fishing vessels through the data of fishing trajectories.In this paper,the distribution of shrimp farms in the East China Sea is predicted by studying the trajectories and behavior patterns of shrimp boats in the system of fishing trajectories.At the same time,a set of shrimp farm distribution management system based on Back Propagation algorithm is established.It can monitor the trajectories of fishing boats and the distribution of shrimp groups in real time,which effectively improves the work efficiency and management mode of the management department.It also plays a positive role in regulating the behavior of fishing boats at sea.展开更多
To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain mod...To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain model are built,in which manufacturers design products alone in the retail channel,while the platform and manufacturer complete the product design in the direct online channel.These two models are analyzed using the game theoretical model and numerical simulation.The findings indicate that if the manufacturers design capabilities are not very high and the commission rate is not very low,the manufacturers will choose the direct online channel if the platform s technical efforts are within an interval.When the platform s technical efforts are exogenous,they positively influence the manufacturers decisions;however,in the endogenous case,the platform s effect on the manufacturers is reflected in the interaction of the commission rate and cost efficiency.The manufacturers and the platform should make synthetic effort decisions based on the manufacturer s development capabilities,the intensity of market competition,and the cost efficiency of the platform.展开更多
It is a research subject that has attracted a wide concern and study for a long time to find a suitable trading point of stock.From the views of big data and quantization technique,the paper tries to propose an approa...It is a research subject that has attracted a wide concern and study for a long time to find a suitable trading point of stock.From the views of big data and quantization technique,the paper tries to propose an approach,through the form of algorithm,based on big data analysis and linear weighted moving average curve,to find the point of buying stock,so that the trader would like to achieve the expected profit with a higher probability;and makes the digital experiment to further explain the approach and verify its performance.This work can promote the development of big data research and quantization technique,and can also provide a certain reference method for the trader making the technology analysis of the trade.展开更多
Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and ...Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.展开更多
In the era of big data,the ways people work,live and think have changed dramatically,and the social governance system is also being restructured.Achieving intelligent social governance has now become a national strate...In the era of big data,the ways people work,live and think have changed dramatically,and the social governance system is also being restructured.Achieving intelligent social governance has now become a national strategy.The application of big data technology to counterterrorism efforts has become a powerful weapon for all countries.However,due to the uncertainty,difficulty of interpretation and potential risk of discrimination in big data technology and algorithm models,basic human rights,freedom and even ethics are likely to be impacted and challenged.As a result,there is an urgent need to prioritize basic human rights and regulate the application of big data for counter terrorism purposes.The legislation and law enforcement regarding the use of big data to counter terrorism must be subject to constitutional and other legal reviews,so as to strike a balance between safeguarding national security and protecting basic human rights.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,o...The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.展开更多
Over the past era,subgraph mining from a large collection of graph database is a crucial problem.In addition,scalability is another big problem due to insufficient storage.There are several security challenges associa...Over the past era,subgraph mining from a large collection of graph database is a crucial problem.In addition,scalability is another big problem due to insufficient storage.There are several security challenges associated with subgraph mining in today’s on-demand system.To address this downside,our proposed work introduces a Blockchain-based Consensus algorithm for Authenticated query search in the Large-Scale Dynamic Graphs(BCCA-LSDG).The two-fold process is handled in the proposed BCCA-LSDG:graph indexing and authenticated query search(query processing).A blockchain-based reputation system is meant to maintain the trust blockchain and cloud server of the proposed architecture.To resolve the issues and provide safe big data transmission,the proposed technique also combines blockchain with a consensus algorithm architecture.Security of the big data is ensured by dividing the BC network into distinct networks,each with a restricted number of allowed entities,data kept in the cloud gate server,and data analysis in the blockchain.The consensus algorithm is crucial for maintaining the speed,performance and security of the blockchain.Then Dual Similarity based MapReduce helps in mapping and reducing the relevant subgraphs with the use of optimal feature sets.Finally,the graph index refinement process is undertaken to improve the query results.Concerning query error,fuzzy logic is used to refine the index of the graph dynamically.The proposed technique outperforms advanced methodologies in both blockchain and non-blockchain systems,and the combination of blockchain and subgraph provides a secure communication platform,according to the findings.展开更多
A healthy balanced diet and a healthy lifestyle are very closely linked.Whichever the biological link is,it is overwhelming to understand.Modifications in how food is served,divided up,and supervised,such as the intro...A healthy balanced diet and a healthy lifestyle are very closely linked.Whichever the biological link is,it is overwhelming to understand.Modifications in how food is served,divided up,and supervised,such as the introduction of nutritional hygiene standards,food handling practices,and the entry of macro and micronutrients,have had a big impact on human health in the last few decades.Growing evidence indicates that our gut microbiota may affect our health in ways that are at least in part influenced by our diet and the ingredients used in the preparation of our food and drinks,as well as other factors.As a new problem,this one is getting a lot of attention,but it would be hard to figure out how the gut microbiota and nutrition molecules work together and how they work in certain situations.Genetic analysis,metagenomic characterization,configuration analysis of foodstuffs,and the shift to digital health information have provided massive amounts of data that might be useful in tackling this problem.Machine learning and deep learning methods will be employed extensively as part of this research in order to blend complicated data frames and extract crucial information that will be capable of exposing and grasping the incredibly delicate links that prevail between diet,gut microbiome,and overall wellbeing.Nutrition,well-being,and gut microorganisms are a few subjects covered in this field.It takes into account not only databases and high-speed technology,but also virtual machine problem-solving skills,intangible assets,and laws.This is how it works:Computer vision,data mining,and analytics are all discussed extensively in this study piece.We also point out limitations in existing methodologies and new situations that discovered in the context of current scientific knowledge in the decades to come.We also provide background on"bioinformatics"algorithms;recent developments may seem to herald a revolution in clinical research,pushing traditional techniques to the sidelines.Furthermore,their true potential rests in their ability to work in conjunction with,rather than as a substitute for,traditional research hypotheses and procedures.When new metadata propositions are made by focusing on easily understandable frameworks,they will always need to be rigorously validated and brought into question.Because of the huge datasets available,assumption analysis may be used to complement rather than a substitute for more conventional concept-driven scientific investigation.It is only by employing all of us that we will all increase the quality of evidence-based practice.展开更多
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
文摘This study aims to investigate the influence of social media on college choice among undergraduates majoring in Big Data Management and Application in China.The study attempts to reveal how information on social media platforms such as Weibo,WeChat,and Zhihu influences the cognition and choice process of prospective students.By employing an online quantitative survey questionnaire,data were collected from the 2022 and 2023 classes of new students majoring in Big Data Management and Application at Guilin University of Electronic Technology.The aim was to evaluate the role of social media in their college choice process and understand the features and information that most attract prospective students.Social media has become a key factor influencing the college choice decision-making of undergraduates majoring in Big Data Management and Application in China.Students tend to obtain school information through social media platforms and use this information as an important reference in their decision-making process.Higher education institutions should strengthen their social media information dissemination,providing accurate,timely,and attractive information.It is also necessary to ensure effective management of social media platforms,maintain a positive reputation for the school on social media,and increase the interest and trust of prospective students.Simultaneously,educational decision-makers should consider incorporating social media analysis into their recruitment strategies to better attract new student enrollment.This study provides a new perspective for understanding higher education choice behavior in the digital age,particularly by revealing the importance of social media in the educational decision-making process.This has important practical and theoretical implications for higher education institutions,policymakers,and social media platform operators.
基金supported by 2020 Foshan Science and Technology Project(Numbering:2020001005356),Baoling Qin received the grant.
文摘Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.
基金supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),funded by the Ministry of Health&Welfare,Republic of Korea(Grant Number:HI21C1831)the Soonchunhyang University Research Fund.
文摘Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it.Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset.Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool.With this motivation,this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics(MR-HDBCC).The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data.In addition,the MR-HDBCC technique involves three distinct processes namely pre-processing,clustering,and classification.The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise(DBSCAN)techni-que which is capable of detecting random shapes and diverse clusters with noisy data.For improving the performance of the DBSCAN technique,a hybrid model using cockroach swarm optimization(CSO)algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering.Finally,bidirectional gated recurrent neural network(BGRNN)is employed for the classification of big data.The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.
基金Nanjing Key Laboratory of Intelligent Information Processing Open Fund Project(No.19AIP05)。
文摘Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.
文摘The aim of this paper is to present a distributed algorithm for big data classification, and its application for Magnetic Resonance Images (MRI) segmentation. We choose the well-known classification method which is the c-means method. The proposed method is introduced in order to perform a cognitive program which is assigned to be implemented on a parallel and distributed machine based on mobile agents. The main idea of the proposed algorithm is to execute the c-means classification procedure by the Mobile Classification Agents (Team Workers) on different nodes on their data at the same time and provide the results to their Mobile Host Agent (Team Leader) which computes the global results and orchestrates the classification until the convergence condition is achieved and the output segmented images will be provided from the Mobile Classification Agents. The data in our case are the big data MRI image of size (m × n) which is splitted into (m × n) elementary images one per mobile classification agent to perform the classification procedure. The experimental results show that the use of the distributed architecture improves significantly the big data segmentation efficiency.
文摘To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.
文摘In recent years,China has paid more and more attention to the development of marine economy and the management and protection of fishery resources.The management departments at all levels regulate and manage the fishing behavior of fishing vessels through the data of fishing trajectories.In this paper,the distribution of shrimp farms in the East China Sea is predicted by studying the trajectories and behavior patterns of shrimp boats in the system of fishing trajectories.At the same time,a set of shrimp farm distribution management system based on Back Propagation algorithm is established.It can monitor the trajectories of fishing boats and the distribution of shrimp groups in real time,which effectively improves the work efficiency and management mode of the management department.It also plays a positive role in regulating the behavior of fishing boats at sea.
基金The National Natural Science Foundation of China(No.72071039)the Foundation of China Scholarship Council(No.202106090197)。
文摘To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain model are built,in which manufacturers design products alone in the retail channel,while the platform and manufacturer complete the product design in the direct online channel.These two models are analyzed using the game theoretical model and numerical simulation.The findings indicate that if the manufacturers design capabilities are not very high and the commission rate is not very low,the manufacturers will choose the direct online channel if the platform s technical efforts are within an interval.When the platform s technical efforts are exogenous,they positively influence the manufacturers decisions;however,in the endogenous case,the platform s effect on the manufacturers is reflected in the interaction of the commission rate and cost efficiency.The manufacturers and the platform should make synthetic effort decisions based on the manufacturer s development capabilities,the intensity of market competition,and the cost efficiency of the platform.
基金supported by the Natural Science Foundation of Liaoning province under Great 20170540821.
文摘It is a research subject that has attracted a wide concern and study for a long time to find a suitable trading point of stock.From the views of big data and quantization technique,the paper tries to propose an approach,through the form of algorithm,based on big data analysis and linear weighted moving average curve,to find the point of buying stock,so that the trader would like to achieve the expected profit with a higher probability;and makes the digital experiment to further explain the approach and verify its performance.This work can promote the development of big data research and quantization technique,and can also provide a certain reference method for the trader making the technology analysis of the trade.
基金This work was supported by the Deanship of Scientific Research at Qassim University.
文摘Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.
文摘In the era of big data,the ways people work,live and think have changed dramatically,and the social governance system is also being restructured.Achieving intelligent social governance has now become a national strategy.The application of big data technology to counterterrorism efforts has become a powerful weapon for all countries.However,due to the uncertainty,difficulty of interpretation and potential risk of discrimination in big data technology and algorithm models,basic human rights,freedom and even ethics are likely to be impacted and challenged.As a result,there is an urgent need to prioritize basic human rights and regulate the application of big data for counter terrorism purposes.The legislation and law enforcement regarding the use of big data to counter terrorism must be subject to constitutional and other legal reviews,so as to strike a balance between safeguarding national security and protecting basic human rights.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
基金The author would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under Project Number No.R-2022-85.
文摘The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.
文摘Over the past era,subgraph mining from a large collection of graph database is a crucial problem.In addition,scalability is another big problem due to insufficient storage.There are several security challenges associated with subgraph mining in today’s on-demand system.To address this downside,our proposed work introduces a Blockchain-based Consensus algorithm for Authenticated query search in the Large-Scale Dynamic Graphs(BCCA-LSDG).The two-fold process is handled in the proposed BCCA-LSDG:graph indexing and authenticated query search(query processing).A blockchain-based reputation system is meant to maintain the trust blockchain and cloud server of the proposed architecture.To resolve the issues and provide safe big data transmission,the proposed technique also combines blockchain with a consensus algorithm architecture.Security of the big data is ensured by dividing the BC network into distinct networks,each with a restricted number of allowed entities,data kept in the cloud gate server,and data analysis in the blockchain.The consensus algorithm is crucial for maintaining the speed,performance and security of the blockchain.Then Dual Similarity based MapReduce helps in mapping and reducing the relevant subgraphs with the use of optimal feature sets.Finally,the graph index refinement process is undertaken to improve the query results.Concerning query error,fuzzy logic is used to refine the index of the graph dynamically.The proposed technique outperforms advanced methodologies in both blockchain and non-blockchain systems,and the combination of blockchain and subgraph provides a secure communication platform,according to the findings.
文摘A healthy balanced diet and a healthy lifestyle are very closely linked.Whichever the biological link is,it is overwhelming to understand.Modifications in how food is served,divided up,and supervised,such as the introduction of nutritional hygiene standards,food handling practices,and the entry of macro and micronutrients,have had a big impact on human health in the last few decades.Growing evidence indicates that our gut microbiota may affect our health in ways that are at least in part influenced by our diet and the ingredients used in the preparation of our food and drinks,as well as other factors.As a new problem,this one is getting a lot of attention,but it would be hard to figure out how the gut microbiota and nutrition molecules work together and how they work in certain situations.Genetic analysis,metagenomic characterization,configuration analysis of foodstuffs,and the shift to digital health information have provided massive amounts of data that might be useful in tackling this problem.Machine learning and deep learning methods will be employed extensively as part of this research in order to blend complicated data frames and extract crucial information that will be capable of exposing and grasping the incredibly delicate links that prevail between diet,gut microbiome,and overall wellbeing.Nutrition,well-being,and gut microorganisms are a few subjects covered in this field.It takes into account not only databases and high-speed technology,but also virtual machine problem-solving skills,intangible assets,and laws.This is how it works:Computer vision,data mining,and analytics are all discussed extensively in this study piece.We also point out limitations in existing methodologies and new situations that discovered in the context of current scientific knowledge in the decades to come.We also provide background on"bioinformatics"algorithms;recent developments may seem to herald a revolution in clinical research,pushing traditional techniques to the sidelines.Furthermore,their true potential rests in their ability to work in conjunction with,rather than as a substitute for,traditional research hypotheses and procedures.When new metadata propositions are made by focusing on easily understandable frameworks,they will always need to be rigorously validated and brought into question.Because of the huge datasets available,assumption analysis may be used to complement rather than a substitute for more conventional concept-driven scientific investigation.It is only by employing all of us that we will all increase the quality of evidence-based practice.