Aflood is a significant damaging natural calamity that causes loss of life and property.Earlier work on the construction offlood prediction models intended to reduce risks,suggest policies,reduce mortality,and limit prop...Aflood is a significant damaging natural calamity that causes loss of life and property.Earlier work on the construction offlood prediction models intended to reduce risks,suggest policies,reduce mortality,and limit property damage caused byfloods.The massive amount of data generated by social media platforms such as Twitter opens the door toflood analysis.Because of the real-time nature of Twitter data,some government agencies and authorities have used it to track natural catastrophe events in order to build a more rapid rescue strategy.However,due to the shorter duration of Tweets,it is difficult to construct a perfect prediction model for determiningflood.Machine learning(ML)and deep learning(DL)approaches can be used to statistically developflood prediction models.At the same time,the vast amount of Tweets necessitates the use of a big data analytics(BDA)tool forflood prediction.In this regard,this work provides an optimal deep learning-basedflood forecasting model with big data analytics(ODLFF-BDA)based on Twitter data.The suggested ODLFF-BDA technique intends to anticipate the existence offloods using tweets in a big data setting.The ODLFF-BDA technique comprises data pre-processing to convert the input tweets into a usable format.In addition,a Bidirectional Encoder Representations from Transformers(BERT)model is used to generate emotive contextual embed-ding from tweets.Furthermore,a gated recurrent unit(GRU)with a Multilayer Convolutional Neural Network(MLCNN)is used to extract local data and predict theflood.Finally,an Equilibrium Optimizer(EO)is used tofine-tune the hyper-parameters of the GRU and MLCNN models in order to increase prediction performance.The memory usage is pull down lesser than 3.5 MB,if its compared with the other algorithm techniques.The ODLFF-BDA technique’s performance was validated using a benchmark Kaggle dataset,and thefindings showed that it outperformed other recent approaches significantly.展开更多
Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficie...Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.展开更多
Recently,internet stimulates the explosive progress of knowledge discovery in big volume data resource,to dig the valuable and hidden rules by computing.Simultaneously,the wireless channel measurement data reveals big...Recently,internet stimulates the explosive progress of knowledge discovery in big volume data resource,to dig the valuable and hidden rules by computing.Simultaneously,the wireless channel measurement data reveals big volume feature,considering the massive antennas,huge bandwidth and versatile application scenarios.This article firstly presents a comprehensive survey of channel measurement and modeling research for mobile communication,especially for 5th Generation(5G) and beyond.Considering the big data research progress,then a cluster-nuclei based model is proposed,which takes advantages of both the stochastical model and deterministic model.The novel model has low complexity with the limited number of cluster-nuclei while the cluster-nuclei has the physical mapping to real propagation objects.Combining the channel properties variation principles with antenna size,frequency,mobility and scenario dug from the channel data,the proposed model can be expanded in versatile application to support future mobile research.展开更多
Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in c...Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.展开更多
Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and ...Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.展开更多
Climate change and human activities have reduced the area and degraded the functions and services of wetlands in China.To protect and restore wetlands,it is urgent to predict the spatial distribution of potential wetl...Climate change and human activities have reduced the area and degraded the functions and services of wetlands in China.To protect and restore wetlands,it is urgent to predict the spatial distribution of potential wetlands.In this study,the distribution of potential wetlands in China was simulated by integrating the advantages of Google Earth Engine with geographic big data and machine learning algorithms.Based on a potential wetland database with 46,000 samples and an indicator system of 30 hydrologic,soil,vegetation,and topographic factors,a simulation model was constructed by machine learning algorithms.The accuracy of the random forest model for simulating the distribution of potential wetlands in China was good,with an area under the receiver operating characteristic curve value of 0.851.The area of potential wetlands was 332,702 km^(2),with 39.0%of potential wetlands in Northeast China.Geographic features were notable,and potential wetlands were mainly concentrated in areas with 400-600 mm precipitation,semi-hydric and hydric soils,meadow and marsh vegetation,altitude less than 700 m,and slope less than 3°.The results provide an important reference for wetland remote sensing mapping and a scientific basis for wetland management in China.展开更多
Data fusion is a multidisciplinary research area that involves different domains.It is used to attain minimum detection error probability and maximum reliability with the help of data retrieved from multiple healthcar...Data fusion is a multidisciplinary research area that involves different domains.It is used to attain minimum detection error probability and maximum reliability with the help of data retrieved from multiple healthcare sources.The generation of huge quantity of data from medical devices resulted in the formation of big data during which data fusion techniques become essential.Securing medical data is a crucial issue of exponentially-pacing computing world and can be achieved by Intrusion Detection Systems(IDS).In this regard,since singularmodality is not adequate to attain high detection rate,there is a need exists to merge diverse techniques using decision-based multimodal fusion process.In this view,this research article presents a new multimodal fusion-based IDS to secure the healthcare data using Spark.The proposed model involves decision-based fusion model which has different processes such as initialization,pre-processing,Feature Selection(FS)and multimodal classification for effective detection of intrusions.In FS process,a chaotic Butterfly Optimization(BO)algorithmcalled CBOA is introduced.Though the classic BO algorithm offers effective exploration,it fails in achieving faster convergence.In order to overcome this,i.e.,to improve the convergence rate,this research work modifies the required parameters of BO algorithm using chaos theory.Finally,to detect intrusions,multimodal classifier is applied by incorporating three Deep Learning(DL)-based classification models.Besides,the concepts like Hadoop MapReduce and Spark were also utilized in this study to achieve faster computation of big data in parallel computation platform.To validate the outcome of the presented model,a series of experimentations was performed using the benchmark NSLKDDCup99 Dataset repository.The proposed model demonstrated its effective results on the applied dataset by offering the maximum accuracy of 99.21%,precision of 98.93%and detection rate of 99.59%.The results assured the betterment of the proposed model.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
The majority of spatial data reveal some degree of spatial dependence. The term “spatial dependence” refers to the tendency for phenomena to be more similar when they occur close together than when they occur far ap...The majority of spatial data reveal some degree of spatial dependence. The term “spatial dependence” refers to the tendency for phenomena to be more similar when they occur close together than when they occur far apart in space. This property is ignored in machine learning (ML) for spatial domains of application. Most classical machine learning algorithms are generally inappropriate unless modified in some way to account for it. In this study, we proposed an approach that aimed to improve a ML model to detect the dependence without incorporating any spatial features in the learning process. To detect this dependence while also improving performance, a hybrid model was used based on two representative algorithms. In addition, cross-validation method was used to make the model stable. Furthermore, global moran’s I and local moran were used to capture the spatial dependence in the residuals. The results show that the HM has significant with a R2 of 99.91% performance compared to RBFNN and RF that have 74.22% and 82.26% as R2 respectively. With lower errors, the HM was able to achieve an average test error of 0.033% and a positive global moran’s of 0.12. We concluded that as the R2 value increases, the models become weaker in terms of capturing the dependence.展开更多
文摘Aflood is a significant damaging natural calamity that causes loss of life and property.Earlier work on the construction offlood prediction models intended to reduce risks,suggest policies,reduce mortality,and limit property damage caused byfloods.The massive amount of data generated by social media platforms such as Twitter opens the door toflood analysis.Because of the real-time nature of Twitter data,some government agencies and authorities have used it to track natural catastrophe events in order to build a more rapid rescue strategy.However,due to the shorter duration of Tweets,it is difficult to construct a perfect prediction model for determiningflood.Machine learning(ML)and deep learning(DL)approaches can be used to statistically developflood prediction models.At the same time,the vast amount of Tweets necessitates the use of a big data analytics(BDA)tool forflood prediction.In this regard,this work provides an optimal deep learning-basedflood forecasting model with big data analytics(ODLFF-BDA)based on Twitter data.The suggested ODLFF-BDA technique intends to anticipate the existence offloods using tweets in a big data setting.The ODLFF-BDA technique comprises data pre-processing to convert the input tweets into a usable format.In addition,a Bidirectional Encoder Representations from Transformers(BERT)model is used to generate emotive contextual embed-ding from tweets.Furthermore,a gated recurrent unit(GRU)with a Multilayer Convolutional Neural Network(MLCNN)is used to extract local data and predict theflood.Finally,an Equilibrium Optimizer(EO)is used tofine-tune the hyper-parameters of the GRU and MLCNN models in order to increase prediction performance.The memory usage is pull down lesser than 3.5 MB,if its compared with the other algorithm techniques.The ODLFF-BDA technique’s performance was validated using a benchmark Kaggle dataset,and thefindings showed that it outperformed other recent approaches significantly.
基金supported by 2020 Foshan Science and Technology Project(Numbering:2020001005356),Baoling Qin received the grant.
文摘Although the Internet of Things has been widely applied,the problems of cloud computing in the application of digital smart medical Big Data collection,processing,analysis,and storage remain,especially the low efficiency of medical diagnosis.And with the wide application of the Internet of Things and Big Data in the medical field,medical Big Data is increasing in geometric magnitude resulting in cloud service overload,insufficient storage,communication delay,and network congestion.In order to solve these medical and network problems,a medical big-data-oriented fog computing architec-ture and BP algorithm application are proposed,and its structural advantages and characteristics are studied.This architecture enables the medical Big Data generated by medical edge devices and the existing data in the cloud service center to calculate,compare and analyze the fog node through the Internet of Things.The diagnosis results are designed to reduce the business processing delay and improve the diagnosis effect.Considering the weak computing of each edge device,the artificial intelligence BP neural network algorithm is used in the core computing model of the medical diagnosis system to improve the system computing power,enhance the medical intelligence-aided decision-making,and improve the clinical diagnosis and treatment efficiency.In the application process,combined with the characteristics of medical Big Data technology,through fog architecture design and Big Data technology integration,we could research the processing and analysis of heterogeneous data of the medical diagnosis system in the context of the Internet of Things.The results are promising:The medical platform network is smooth,the data storage space is sufficient,the data processing and analysis speed is fast,the diagnosis effect is remarkable,and it is a good assistant to doctors’treatment effect.It not only effectively solves the problem of low clinical diagnosis,treatment efficiency and quality,but also reduces the waiting time of patients,effectively solves the contradiction between doctors and patients,and improves the medical service quality and management level.
基金supported in part by National Natural Science Foundation of China (61322110, 6141101115)Doctoral Fund of Ministry of Education (201300051100013)
文摘Recently,internet stimulates the explosive progress of knowledge discovery in big volume data resource,to dig the valuable and hidden rules by computing.Simultaneously,the wireless channel measurement data reveals big volume feature,considering the massive antennas,huge bandwidth and versatile application scenarios.This article firstly presents a comprehensive survey of channel measurement and modeling research for mobile communication,especially for 5th Generation(5G) and beyond.Considering the big data research progress,then a cluster-nuclei based model is proposed,which takes advantages of both the stochastical model and deterministic model.The novel model has low complexity with the limited number of cluster-nuclei while the cluster-nuclei has the physical mapping to real propagation objects.Combining the channel properties variation principles with antenna size,frequency,mobility and scenario dug from the channel data,the proposed model can be expanded in versatile application to support future mobile research.
文摘Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.
基金This work was supported by the Deanship of Scientific Research at Qassim University.
文摘Today’s world is a data-driven one,with data being produced in vast amounts as a result of the rapid growth of technology that permeates every aspect of our lives.New data processing techniques must be developed and refined over time to gain meaningful insights from this vast continuous volume of produced data in various forms.Machine learning technologies provide promising solutions and potential methods for processing large quantities of data and gaining value from it.This study conducts a literature review on the application of machine learning techniques in big data processing.It provides a general overview of machine learning algorithms and techniques,a brief introduction to big data,and a discussion of related works that have used machine learning techniques in a variety of sectors to process big amounts of data.The study also discusses the challenges and issues associated with the usage of machine learning for big data.
基金supported by the Natural Science Foundation of Jilin Province,China[YDZJ202301ZYTS218]the National Natural Science Foundation of China[42301430,42222103,42171379,U2243230,and 42101379]+1 种基金the Youth Innovation Promotion Association of the Chinese Academy of Sciences[2017277 and 2021227]the Professional Association of the Alliance of International Science Organizations[ANSO-PA-2020-14].
文摘Climate change and human activities have reduced the area and degraded the functions and services of wetlands in China.To protect and restore wetlands,it is urgent to predict the spatial distribution of potential wetlands.In this study,the distribution of potential wetlands in China was simulated by integrating the advantages of Google Earth Engine with geographic big data and machine learning algorithms.Based on a potential wetland database with 46,000 samples and an indicator system of 30 hydrologic,soil,vegetation,and topographic factors,a simulation model was constructed by machine learning algorithms.The accuracy of the random forest model for simulating the distribution of potential wetlands in China was good,with an area under the receiver operating characteristic curve value of 0.851.The area of potential wetlands was 332,702 km^(2),with 39.0%of potential wetlands in Northeast China.Geographic features were notable,and potential wetlands were mainly concentrated in areas with 400-600 mm precipitation,semi-hydric and hydric soils,meadow and marsh vegetation,altitude less than 700 m,and slope less than 3°.The results provide an important reference for wetland remote sensing mapping and a scientific basis for wetland management in China.
文摘Data fusion is a multidisciplinary research area that involves different domains.It is used to attain minimum detection error probability and maximum reliability with the help of data retrieved from multiple healthcare sources.The generation of huge quantity of data from medical devices resulted in the formation of big data during which data fusion techniques become essential.Securing medical data is a crucial issue of exponentially-pacing computing world and can be achieved by Intrusion Detection Systems(IDS).In this regard,since singularmodality is not adequate to attain high detection rate,there is a need exists to merge diverse techniques using decision-based multimodal fusion process.In this view,this research article presents a new multimodal fusion-based IDS to secure the healthcare data using Spark.The proposed model involves decision-based fusion model which has different processes such as initialization,pre-processing,Feature Selection(FS)and multimodal classification for effective detection of intrusions.In FS process,a chaotic Butterfly Optimization(BO)algorithmcalled CBOA is introduced.Though the classic BO algorithm offers effective exploration,it fails in achieving faster convergence.In order to overcome this,i.e.,to improve the convergence rate,this research work modifies the required parameters of BO algorithm using chaos theory.Finally,to detect intrusions,multimodal classifier is applied by incorporating three Deep Learning(DL)-based classification models.Besides,the concepts like Hadoop MapReduce and Spark were also utilized in this study to achieve faster computation of big data in parallel computation platform.To validate the outcome of the presented model,a series of experimentations was performed using the benchmark NSLKDDCup99 Dataset repository.The proposed model demonstrated its effective results on the applied dataset by offering the maximum accuracy of 99.21%,precision of 98.93%and detection rate of 99.59%.The results assured the betterment of the proposed model.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
文摘The majority of spatial data reveal some degree of spatial dependence. The term “spatial dependence” refers to the tendency for phenomena to be more similar when they occur close together than when they occur far apart in space. This property is ignored in machine learning (ML) for spatial domains of application. Most classical machine learning algorithms are generally inappropriate unless modified in some way to account for it. In this study, we proposed an approach that aimed to improve a ML model to detect the dependence without incorporating any spatial features in the learning process. To detect this dependence while also improving performance, a hybrid model was used based on two representative algorithms. In addition, cross-validation method was used to make the model stable. Furthermore, global moran’s I and local moran were used to capture the spatial dependence in the residuals. The results show that the HM has significant with a R2 of 99.91% performance compared to RBFNN and RF that have 74.22% and 82.26% as R2 respectively. With lower errors, the HM was able to achieve an average test error of 0.033% and a positive global moran’s of 0.12. We concluded that as the R2 value increases, the models become weaker in terms of capturing the dependence.