This work presents an advanced and detailed analysis of the mechanisms of hepatitis B virus(HBV)propagation in an environment characterized by variability and stochas-ticity.Based on some biological features of the vi...This work presents an advanced and detailed analysis of the mechanisms of hepatitis B virus(HBV)propagation in an environment characterized by variability and stochas-ticity.Based on some biological features of the virus and the assumptions,the corresponding deterministic model is formulated,which takes into consideration the effect of vaccination.This deterministic model is extended to a stochastic framework by considering a new form of disturbance which makes it possible to simulate strong and significant fluctuations.The long-term behaviors of the virus are predicted by using stochastic differential equations with second-order multiplicative α-stable jumps.By developing the assumptions and employing the novel theoretical tools,the threshold parameter responsible for ergodicity(persistence)and extinction is provided.The theoretical results of the current study are validated by numerical simulations and parameters estimation is also performed.Moreover,we obtain the following new interesting findings:(a)in each class,the average time depends on the value ofα;(b)the second-order noise has an inverse effect on the spread of the virus;(c)the shapes of population densities at stationary level quickly changes at certain values of α.The last three conclusions can provide a solid research base for further investigation in the field of biological and ecological modeling.展开更多
Amid the landscape of Cloud Computing(CC),the Cloud Datacenter(DC)stands as a conglomerate of physical servers,whose performance can be hindered by bottlenecks within the realm of proliferating CC services.A linchpin ...Amid the landscape of Cloud Computing(CC),the Cloud Datacenter(DC)stands as a conglomerate of physical servers,whose performance can be hindered by bottlenecks within the realm of proliferating CC services.A linchpin in CC’s performance,the Cloud Service Broker(CSB),orchestrates DC selection.Failure to adroitly route user requests with suitable DCs transforms the CSB into a bottleneck,endangering service quality.To tackle this,deploying an efficient CSB policy becomes imperative,optimizing DC selection to meet stringent Qualityof-Service(QoS)demands.Amidst numerous CSB policies,their implementation grapples with challenges like costs and availability.This article undertakes a holistic review of diverse CSB policies,concurrently surveying the predicaments confronted by current policies.The foremost objective is to pinpoint research gaps and remedies to invigorate future policy development.Additionally,it extensively clarifies various DC selection methodologies employed in CC,enriching practitioners and researchers alike.Employing synthetic analysis,the article systematically assesses and compares myriad DC selection techniques.These analytical insights equip decision-makers with a pragmatic framework to discern the apt technique for their needs.In summation,this discourse resoundingly underscores the paramount importance of adept CSB policies in DC selection,highlighting the imperative role of efficient CSB policies in optimizing CC performance.By emphasizing the significance of these policies and their modeling implications,the article contributes to both the general modeling discourse and its practical applications in the CC domain.展开更多
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to...This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.展开更多
The joint European Space Agency and Chinese Academy of Sciences Solar wind Magnetosphere Ionosphere Link Explorer(SMILE)mission will explore global dynamics of the magnetosphere under varying solar wind and interplane...The joint European Space Agency and Chinese Academy of Sciences Solar wind Magnetosphere Ionosphere Link Explorer(SMILE)mission will explore global dynamics of the magnetosphere under varying solar wind and interplanetary magnetic field conditions,and simultaneously monitor the auroral response of the Northern Hemisphere ionosphere.Combining these large-scale responses with medium and fine-scale measurements at a variety of cadences by additional ground-based and space-based instruments will enable a much greater scientific impact beyond the original goals of the SMILE mission.Here,we describe current community efforts to prepare for SMILE,and the benefits and context various experiments that have explicitly expressed support for SMILE can offer.A dedicated group of international scientists representing many different experiment types and geographical locations,the Ground-based and Additional Science Working Group,is facilitating these efforts.Preparations include constructing an online SMILE Data Fusion Facility,the discussion of particular or special modes for experiments such as coherent and incoherent scatter radar,and the consideration of particular observing strategies and spacecraft conjunctions.We anticipate growing interest and community engagement with the SMILE mission,and we welcome novel ideas and insights from the solar-terrestrial community.展开更多
In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclu...In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclusions.While more data become available,we find some data analysis studies were too simplistic,while others too overwhelming in answering this challenging question.We find the approach that analyzes life expectancy related data(mortality causes and rate for different cohorts)inspiring,and use this approach to study a carefully selected set of targets for comparison.In discussing the factors that matter,we argue that it is more reasonable to try to identify a set of factors that together explain the phenomenon.展开更多
Developing precise and fast methods for short circuit detection is crucial for preventing or mitigating the risk of safety issues of lithium-ion batteries(LIBs).In this paper,we developed a Convolutional Neural Networ...Developing precise and fast methods for short circuit detection is crucial for preventing or mitigating the risk of safety issues of lithium-ion batteries(LIBs).In this paper,we developed a Convolutional Neural Networks(CNN)based model that can quickly and precisely predict the short circuit resistance of LIB cells during various working conditions.Cycling tests of cells with an external short circuit(ESC)are produced to obtain the database and generate the training/testing samples.The samples are sequences of voltage,current,charging capacity,charging energy,total charging capacity,total charging energy with a length of 120 s and frequency of 1 Hz,and their corresponding short circuit resistances.A big database with~6×10^(5)samples are generated,covering various short circuit resistances(47~470Ω),current loading modes(Constant current-constant voltage(CC-CV)and drive cycle),and electrochemical states(cycle numbers from 1 to 300).Results show that the average relative absolute error of five random sample splits is 6.75%±2.8%.Further parametric analysis indicates the accuracy estimation benefits from the appropriate model setups:the optimized input sequence length(~120 s),feature selection(at least one total capacity-related variable),and rational model design,using multiple layers with different kernel sizes.This work highlights the capabilities of machine learning algorithms and data-driven methodologies in real-time safety risk prediction for batteries.展开更多
With the rapid evolution of Internet technology,fog computing has taken a major role in managing large amounts of data.The major concerns in this domain are security and privacy.Therefore,attaining a reliable level of...With the rapid evolution of Internet technology,fog computing has taken a major role in managing large amounts of data.The major concerns in this domain are security and privacy.Therefore,attaining a reliable level of confidentiality in the fog computing environment is a pivotal task.Among different types of data stored in the fog,the 3D point and mesh fog data are increasingly popular in recent days,due to the growth of 3D modelling and 3D printing technologies.Hence,in this research,we propose a novel scheme for preserving the privacy of 3D point and mesh fog data.Chaotic Cat mapbased data encryption is a recently trending research area due to its unique properties like pseudo-randomness,deterministic nature,sensitivity to initial conditions,ergodicity,etc.To boost encryption efficiency significantly,in this work,we propose a novel Chaotic Cat map.The sequence generated by this map is used to transform the coordinates of the fog data.The improved range of the proposed map is depicted using bifurcation analysis.The quality of the proposed Chaotic Cat map is also analyzed using metrics like Lyapunov exponent and approximate entropy.We also demonstrate the performance of the proposed encryption framework using attacks like brute-force attack and statistical attack.The experimental results clearly depict that the proposed framework produces the best results compared to the previous works in the literature.展开更多
Data assimilation(DA)and uncertainty quantification(UQ)are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics.Typical applications span from computational fluid ...Data assimilation(DA)and uncertainty quantification(UQ)are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics.Typical applications span from computational fluid dynamics(CFD)to geoscience and climate systems.Recently,much effort has been given in combining DA,UQ and machine learning(ML)techniques.These research efforts seek to address some critical challenges in high-dimensional dynamical systems,including but not limited to dynamical system identification,reduced order surrogate modelling,error covariance specification and model error correction.A large number of developed techniques and methodologies exhibit a broad applicability across numerous domains,resulting in the necessity for a comprehensive guide.This paper provides the first overview of state-of-the-art researches in this interdisciplinary field,covering a wide range of applications.This review is aimed at ML scientists who attempt to apply DA and UQ techniques to improve the accuracy and the interpretability of their models,but also at DA and UQ experts who intend to integrate cutting-edge ML approaches to their systems.Therefore,this article has a special focus on how ML methods can overcome the existing limits of DA and UQ,and vice versa.Some exciting perspectives of this rapidly developing research field are also discussed.Index Terms-Data assimilation(DA),deep learning,machine learning(ML),reduced-order-modelling,uncertainty quantification(UQ).展开更多
Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vac...Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vaccine development for COVID-19 is unprecedented.Digital and data technologies were quickly adopted during the pandemic,showing how those technologies can be harnessed to enhance public health and healthcare.A wide range of digital data sources are being utilized and visually presented to enhance the epidemiological surveillance of COVID-19.Digital contact tracing mobile apps have been adopted by many countries to control community transmission.Deep learning has been utilized to achieve various solutions for COVID-19 disruption,including outbreak prediction,virus spread tracking.展开更多
Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the un...Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the unbiased estimation on condition that several hypotheses hold,especially the common trend assumption.In this paper,we gave a systematic demonstration of DID in the science of science,and the potential ways to improve the accuracy of DID method.Design/methodology/approach:At first,we reviewed the statistical assumptions,the model specification,and the application procedures of DID method.Second,to improve the necessary assumptions before conducting DID regression and the accuracy of estimation,we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention.Lastly,we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates,by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.Findings:We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results.As a case study,we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.Research limitations:This study ignored the rigorous mathematical deduction parts of DID,while focused on the practical parts.Practical implications:This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.Originality/value:This study gains insights into the usage of econometric tools in science of science.展开更多
Because stress has such a powerful impact on human health,we must be able to identify it automatically in our everyday lives.The human activity recognition(HAR)system use data from several kinds of sensors to try to r...Because stress has such a powerful impact on human health,we must be able to identify it automatically in our everyday lives.The human activity recognition(HAR)system use data from several kinds of sensors to try to recognize and evaluate human actions automatically recognize and evaluate human actions.Using the multimodal dataset DEAP(Database for Emotion Analysis using Physiological Signals),this paper presents deep learning(DL)technique for effectively detecting human stress.The combination of vision-based and sensor-based approaches for recognizing human stress will help us achieve the increased efficiency of current stress recognition systems and predict probable actions in advance of when fatal.Based on visual and EEG(Electroencephalogram)data,this research aims to enhance the performance and extract the dominating characteristics of stress detection.For the stress identification test,we utilized the DEAP dataset,which included video and EEG data.We also demonstrate that combining video and EEG characteristics may increase overall performance,with the suggested stochastic features providing the most accurate results.In the first step,CNN(Convolutional Neural Network)extracts feature vectors from video frames and EEG data.Feature Level(FL)fusion that combines the features extracted from video and EEG data.We use XGBoost as our classifier model to predict stress,and we put it into action.The stress recognition accuracy of the proposed method is compared to existing methods of Decision Tree(DT),Random Forest(RF),AdaBoost,Linear Discriminant Analysis(LDA),and KNearest Neighborhood(KNN).When we compared our technique to existing state-of-the-art approaches,we found that the suggested DL methodology combining multimodal and heterogeneous inputs may improve stress identification.展开更多
Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have l...Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have learned,and are continuing to discover,about the many social,structural,and environmental determinants of health.We must fully consider the conditions in which people are born,grow,learn,work,play,and age.The study of social determinants of health has made tremendous strides in recent decades.At the same time,we have seen huge advances in how health data are collected,analyzed,and used to inform action in the health sector.It is time to merge these two fields,to harness the best from both and to improve decision-making to accelerate evidence-based action toward greater health equity.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
Virtual Machines are the core of cloud computing and are utilized toget the benefits of cloud computing. Other essential features include portability,recovery after failure, and, most importantly, creating the core me...Virtual Machines are the core of cloud computing and are utilized toget the benefits of cloud computing. Other essential features include portability,recovery after failure, and, most importantly, creating the core mechanismfor load balancing. Several study results have been reported in enhancing loadbalancingsystems employing stochastic or biogenetic optimization methods.It examines the underlying issues with load balancing and the limitationsof present load balance genetic optimization approaches. They are criticizedfor using higher-order probability distributions, more complicated solutionsearch spaces, and adding factors to improve decision-making skills. Thus, thispaper explores the possibility of summarizing load characteristics. Second,this study offers an improved prediction technique for pheromone level predictionover other typical genetic optimization methods during load balancing.It also uses web-based third-party cloud service providers to test and validatethe principles provided in this study. It also reduces VM migrations, timecomplexity, and service level agreements compared to other parallel standardapproaches.展开更多
This study aims to apply ResNet-18 convolutional neural network(CNN)and XGBoost to preoperative computed tomography(CT)images and clinical data for distinguishing Xp11.2 translocation renal cell carcinoma(Xp11.2 tRCC)...This study aims to apply ResNet-18 convolutional neural network(CNN)and XGBoost to preoperative computed tomography(CT)images and clinical data for distinguishing Xp11.2 translocation renal cell carcinoma(Xp11.2 tRCC)from common subtypes of renal cell carcinoma(RCC)in order to provide patients with individualized treatment plans.Data from45 patients with Xp11.2 tRCC fromJanuary 2007 to December 2021 are collected.Clear cell RCC(ccRCC),papillary RCC(pRCC),or chromophobe RCC(chRCC)can be detected from each patient.CT images are acquired in the following three phases:unenhanced,corticomedullary,and nephrographic.A unified framework is proposed for the classification of renal masses.In this framework,ResNet-18 CNN is employed to classify renal cancers with CT images,while XGBoost is adopted with clinical data.Experiments demonstrate that,if applying ResNet-18 CNN or XGBoost singly,the latter outperforms the former,while the framework integrating both technologies performs similarly or better than urologists.Especially,the possibility of misclassifying Xp11.2 tRCC,pRCC,and chRCC as ccRCC by the proposed framework is much lower than urologists.展开更多
The amount of 3D data stored and transmitted in the Internet of Medical Things(IoMT)is increasing,making protecting these medical data increasingly prominent.However,there are relatively few researches on 3D data wate...The amount of 3D data stored and transmitted in the Internet of Medical Things(IoMT)is increasing,making protecting these medical data increasingly prominent.However,there are relatively few researches on 3D data watermarking.Moreover,due to the particularity of medical data,strict data quality should be considered while protecting data security.To solve the problem,in the field of medical volume data,we proposed a robust watermarking algorithm based on Polar Cosine Transform and 3D-Discrete Cosine Transform(PCT and 3D-DCT).Each slice of the volume data was transformed by PCT to obtain feature row vector,and then the reshaped three-dimensional feature matrix was transformed by 3D-DCT.Based on the contour information of the volume data and the detail information of the inner slice,the visual feature vector was obtained by applying the per-ceptual hash.In addition,the watermark was encrypted by a multi-sensitive initial value Sine and Piecewise linear chaotic Mapping(SPM)system,and embedded as a zero watermark.The key was stored in a third party.Under the same experimental conditions,when the volume data is rotated by 80 degrees,cut 25%along the Z axis,and the JPEG compression quality is 1%,the Normalized Correlation Coefficient(NC)of the extracted watermark is 0.80,0.89,and 1.00 respectively,which are significantly higher than the comparison algorithm.展开更多
With the advancements in the era of artificial intelligence,blockchain,cloud computing,and big data,there is a need for secure,decentralized medical record storage and retrieval systems.While cloud storage solves stor...With the advancements in the era of artificial intelligence,blockchain,cloud computing,and big data,there is a need for secure,decentralized medical record storage and retrieval systems.While cloud storage solves storage issues,it is challenging to realize secure sharing of records over the network.Medi-block record in the healthcare system has brought a new digitalization method for patients’medical records.This centralized technology provides a symmetrical process between the hospital and doctors when patients urgently need to go to a different or nearby hospital.It enables electronic medical records to be available with the correct authentication and restricts access to medical data retrieval.Medi-block record is the consumer-centered healthcare data system that brings reliable and transparent datasets for the medical record.This study presents an extensive review of proposed solutions aiming to protect the privacy and integrity of medical data by securing data sharing for Medi-block records.It also aims to propose a comprehensive investigation of the recent advances in different methods of securing data sharing,such as using Blockchain technology,Access Control,Privacy-Preserving,Proxy Re-Encryption,and Service-On-Chain approach.Finally,we highlight the open issues and identify the challenges regarding secure data sharing for Medi-block records in the healthcare systems.展开更多
Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient te...Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it.Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset.Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool.With this motivation,this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics(MR-HDBCC).The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data.In addition,the MR-HDBCC technique involves three distinct processes namely pre-processing,clustering,and classification.The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise(DBSCAN)techni-que which is capable of detecting random shapes and diverse clusters with noisy data.For improving the performance of the DBSCAN technique,a hybrid model using cockroach swarm optimization(CSO)algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering.Finally,bidirectional gated recurrent neural network(BGRNN)is employed for the classification of big data.The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.展开更多
Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate.The common approach to han-dle classification involving imbalanced data is to balance the data using a sampling a...Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate.The common approach to han-dle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling,random oversampling,or Synthetic Minority Oversampling Technique(SMOTE)algorithms.This paper compared the classification performance of three popular classifiers(Logistic Regression,Gaussian Naïve Bayes,and Support Vector Machine)in predicting machine failure in the Oil and Gas industry.The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945(97%)‘non-failure’and 528(3%)‘failure data’.The three independent variables to predict machine failure were pressure indicator,flow indicator,and level indicator.The accuracy of the classifiers is very high and close to 100%,but the sensitivity of all classifiers using the original dataset was close to zero.The performance of the three classifiers was then evaluated for data with different imbalance rates(10%to 50%)generated from the original data using SMOTE,SMOTE-Support Vector Machine(SMOTE-SVM)and SMOTE-Edited Nearest Neighbour(SMOTE-ENN).The classifiers were evaluated based on improvement in sensitivity and F-measure.Results showed that the sensitivity of all classifiers increases as the imbalance rate increases.SVM with radial basis function(RBF)kernel has the highest sensitivity when data is balanced(50:50)using SMOTE(Sensitivitytest=0.5686,Ftest=0.6927)compared to Naïve Bayes(Sensitivitytest=0.4033,Ftest=0.6218)and Logistic Regression(Sensitivitytest=0.4194,Ftest=0.621).Overall,the Gaussian Naïve Bayes model consistently improves sensitivity and F-measure as the imbalance ratio increases,but the sensitivity is below 50%.The classifiers performed better when data was balanced using SMOTE-SVM compared to SMOTE and SMOTE-ENN.展开更多
Big data is a concept that deals with large or complex data sets by using data analysis tools(e.g.,data mining,machine learning)to analyze information extracted from several sources systematically.Big data has attract...Big data is a concept that deals with large or complex data sets by using data analysis tools(e.g.,data mining,machine learning)to analyze information extracted from several sources systematically.Big data has attracted wide attention from academia,for example,in supporting patients and health professionals by improving the accuracy of decision-making,diagnosis and disease prediction.This research aimed to perform a Bibliometric Performance and Network Analysis(BPNA)supported by a Scoping Review(SR)to depict the strategic themes,thematic evolution structure,main challenges and opportunities related to the concept of big data applied in the healthcare sector.With this goal in mind,4857 documents from the Web of Science covering the period between 2009 to June 2020 were analyzed with the support of SciMAT software.The bibliometric performance showed the number of publications and citations over time,scientific productivity and the geographic distribution of publications and research fields.The strategic diagram yielded 20 clusters and their relative importance in terms of centrality and density.The thematic evolution structure presented the most important themes and how it changes over time.Lastly,we presented the main challenges and future opportunities of big data in healthcare.展开更多
基金supported by the NSFC(12201557)the Foundation of Zhejiang Provincial Education Department,China(Y202249921).
文摘This work presents an advanced and detailed analysis of the mechanisms of hepatitis B virus(HBV)propagation in an environment characterized by variability and stochas-ticity.Based on some biological features of the virus and the assumptions,the corresponding deterministic model is formulated,which takes into consideration the effect of vaccination.This deterministic model is extended to a stochastic framework by considering a new form of disturbance which makes it possible to simulate strong and significant fluctuations.The long-term behaviors of the virus are predicted by using stochastic differential equations with second-order multiplicative α-stable jumps.By developing the assumptions and employing the novel theoretical tools,the threshold parameter responsible for ergodicity(persistence)and extinction is provided.The theoretical results of the current study are validated by numerical simulations and parameters estimation is also performed.Moreover,we obtain the following new interesting findings:(a)in each class,the average time depends on the value ofα;(b)the second-order noise has an inverse effect on the spread of the virus;(c)the shapes of population densities at stationary level quickly changes at certain values of α.The last three conclusions can provide a solid research base for further investigation in the field of biological and ecological modeling.
文摘Amid the landscape of Cloud Computing(CC),the Cloud Datacenter(DC)stands as a conglomerate of physical servers,whose performance can be hindered by bottlenecks within the realm of proliferating CC services.A linchpin in CC’s performance,the Cloud Service Broker(CSB),orchestrates DC selection.Failure to adroitly route user requests with suitable DCs transforms the CSB into a bottleneck,endangering service quality.To tackle this,deploying an efficient CSB policy becomes imperative,optimizing DC selection to meet stringent Qualityof-Service(QoS)demands.Amidst numerous CSB policies,their implementation grapples with challenges like costs and availability.This article undertakes a holistic review of diverse CSB policies,concurrently surveying the predicaments confronted by current policies.The foremost objective is to pinpoint research gaps and remedies to invigorate future policy development.Additionally,it extensively clarifies various DC selection methodologies employed in CC,enriching practitioners and researchers alike.Employing synthetic analysis,the article systematically assesses and compares myriad DC selection techniques.These analytical insights equip decision-makers with a pragmatic framework to discern the apt technique for their needs.In summation,this discourse resoundingly underscores the paramount importance of adept CSB policies in DC selection,highlighting the imperative role of efficient CSB policies in optimizing CC performance.By emphasizing the significance of these policies and their modeling implications,the article contributes to both the general modeling discourse and its practical applications in the CC domain.
文摘This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
基金supported by Royal Society grant DHFR1211068funded by UKSA+14 种基金STFCSTFC grant ST/M001083/1funded by STFC grant ST/W00089X/1supported by NERC grant NE/W003309/1(E3d)funded by NERC grant NE/V000748/1support from NERC grants NE/V015133/1,NE/R016038/1(BAS magnetometers),and grants NE/R01700X/1 and NE/R015848/1(EISCAT)supported by NERC grant NE/T000937/1NSFC grants 42174208 and 41821003supported by the Research Council of Norway grant 223252PRODEX arrangement 4000123238 from the European Space Agencysupport of the AUTUMN East-West magnetometer network by the Canadian Space Agencysupported by NASA’s Heliophysics U.S.Participating Investigator Programsupport from grant NSF AGS 2027210supported by grant Dnr:2020-00106 from the Swedish National Space Agencysupported by the German Research Foundation(DFG)under number KR 4375/2-1 within SPP"Dynamic Earth"。
文摘The joint European Space Agency and Chinese Academy of Sciences Solar wind Magnetosphere Ionosphere Link Explorer(SMILE)mission will explore global dynamics of the magnetosphere under varying solar wind and interplanetary magnetic field conditions,and simultaneously monitor the auroral response of the Northern Hemisphere ionosphere.Combining these large-scale responses with medium and fine-scale measurements at a variety of cadences by additional ground-based and space-based instruments will enable a much greater scientific impact beyond the original goals of the SMILE mission.Here,we describe current community efforts to prepare for SMILE,and the benefits and context various experiments that have explicitly expressed support for SMILE can offer.A dedicated group of international scientists representing many different experiment types and geographical locations,the Ground-based and Additional Science Working Group,is facilitating these efforts.Preparations include constructing an online SMILE Data Fusion Facility,the discussion of particular or special modes for experiments such as coherent and incoherent scatter radar,and the consideration of particular observing strategies and spacecraft conjunctions.We anticipate growing interest and community engagement with the SMILE mission,and we welcome novel ideas and insights from the solar-terrestrial community.
基金support of funding(No.UGC/IDS(R)11/21)from the Hong Kong SAR Government.
文摘In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclusions.While more data become available,we find some data analysis studies were too simplistic,while others too overwhelming in answering this challenging question.We find the approach that analyzes life expectancy related data(mortality causes and rate for different cohorts)inspiring,and use this approach to study a carefully selected set of targets for comparison.In discussing the factors that matter,we argue that it is more reasonable to try to identify a set of factors that together explain the phenomenon.
基金supported by the U.S.Department of Energy’s Office on Energy Efficiency and Renewable Energy(EERE)under the Advanced Manufacturing Office,award number DE-EE0009111。
文摘Developing precise and fast methods for short circuit detection is crucial for preventing or mitigating the risk of safety issues of lithium-ion batteries(LIBs).In this paper,we developed a Convolutional Neural Networks(CNN)based model that can quickly and precisely predict the short circuit resistance of LIB cells during various working conditions.Cycling tests of cells with an external short circuit(ESC)are produced to obtain the database and generate the training/testing samples.The samples are sequences of voltage,current,charging capacity,charging energy,total charging capacity,total charging energy with a length of 120 s and frequency of 1 Hz,and their corresponding short circuit resistances.A big database with~6×10^(5)samples are generated,covering various short circuit resistances(47~470Ω),current loading modes(Constant current-constant voltage(CC-CV)and drive cycle),and electrochemical states(cycle numbers from 1 to 300).Results show that the average relative absolute error of five random sample splits is 6.75%±2.8%.Further parametric analysis indicates the accuracy estimation benefits from the appropriate model setups:the optimized input sequence length(~120 s),feature selection(at least one total capacity-related variable),and rational model design,using multiple layers with different kernel sizes.This work highlights the capabilities of machine learning algorithms and data-driven methodologies in real-time safety risk prediction for batteries.
基金This work was supprted by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R151),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘With the rapid evolution of Internet technology,fog computing has taken a major role in managing large amounts of data.The major concerns in this domain are security and privacy.Therefore,attaining a reliable level of confidentiality in the fog computing environment is a pivotal task.Among different types of data stored in the fog,the 3D point and mesh fog data are increasingly popular in recent days,due to the growth of 3D modelling and 3D printing technologies.Hence,in this research,we propose a novel scheme for preserving the privacy of 3D point and mesh fog data.Chaotic Cat mapbased data encryption is a recently trending research area due to its unique properties like pseudo-randomness,deterministic nature,sensitivity to initial conditions,ergodicity,etc.To boost encryption efficiency significantly,in this work,we propose a novel Chaotic Cat map.The sequence generated by this map is used to transform the coordinates of the fog data.The improved range of the proposed map is depicted using bifurcation analysis.The quality of the proposed Chaotic Cat map is also analyzed using metrics like Lyapunov exponent and approximate entropy.We also demonstrate the performance of the proposed encryption framework using attacks like brute-force attack and statistical attack.The experimental results clearly depict that the proposed framework produces the best results compared to the previous works in the literature.
基金the support of the Leverhulme Centre for Wildfires,Environment and Society through the Leverhulme Trust(RC-2018-023)Sibo Cheng,César Quilodran-Casas,and Rossella Arcucci acknowledge the support of the PREMIERE project(EP/T000414/1)+5 种基金the support of EPSRC grant:PURIFY(EP/V000756/1)the Fundamental Research Funds for the Central Universitiesthe support of the SASIP project(353)funded by Schmidt Futures–a philanthropic initiative that seeks to improve societal outcomes through the development of emerging science and technologiesDFG for the Heisenberg Programm Award(JA 1077/4-1)the National Natural Science Foundation of China(61976120)the Natural Science Key Foundat ion of Jiangsu Education Department(21KJA510004)。
文摘Data assimilation(DA)and uncertainty quantification(UQ)are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics.Typical applications span from computational fluid dynamics(CFD)to geoscience and climate systems.Recently,much effort has been given in combining DA,UQ and machine learning(ML)techniques.These research efforts seek to address some critical challenges in high-dimensional dynamical systems,including but not limited to dynamical system identification,reduced order surrogate modelling,error covariance specification and model error correction.A large number of developed techniques and methodologies exhibit a broad applicability across numerous domains,resulting in the necessity for a comprehensive guide.This paper provides the first overview of state-of-the-art researches in this interdisciplinary field,covering a wide range of applications.This review is aimed at ML scientists who attempt to apply DA and UQ techniques to improve the accuracy and the interpretability of their models,but also at DA and UQ experts who intend to integrate cutting-edge ML approaches to their systems.Therefore,this article has a special focus on how ML methods can overcome the existing limits of DA and UQ,and vice versa.Some exciting perspectives of this rapidly developing research field are also discussed.Index Terms-Data assimilation(DA),deep learning,machine learning(ML),reduced-order-modelling,uncertainty quantification(UQ).
文摘Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vaccine development for COVID-19 is unprecedented.Digital and data technologies were quickly adopted during the pandemic,showing how those technologies can be harnessed to enhance public health and healthcare.A wide range of digital data sources are being utilized and visually presented to enhance the epidemiological surveillance of COVID-19.Digital contact tracing mobile apps have been adopted by many countries to control community transmission.Deep learning has been utilized to achieve various solutions for COVID-19 disruption,including outbreak prediction,virus spread tracking.
基金This work was supported by grants from the National Natural Science Foundation of China,with No.NSFC62006109 and NSFC12031005.
文摘Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the unbiased estimation on condition that several hypotheses hold,especially the common trend assumption.In this paper,we gave a systematic demonstration of DID in the science of science,and the potential ways to improve the accuracy of DID method.Design/methodology/approach:At first,we reviewed the statistical assumptions,the model specification,and the application procedures of DID method.Second,to improve the necessary assumptions before conducting DID regression and the accuracy of estimation,we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention.Lastly,we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates,by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.Findings:We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results.As a case study,we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.Research limitations:This study ignored the rigorous mathematical deduction parts of DID,while focused on the practical parts.Practical implications:This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.Originality/value:This study gains insights into the usage of econometric tools in science of science.
文摘Because stress has such a powerful impact on human health,we must be able to identify it automatically in our everyday lives.The human activity recognition(HAR)system use data from several kinds of sensors to try to recognize and evaluate human actions automatically recognize and evaluate human actions.Using the multimodal dataset DEAP(Database for Emotion Analysis using Physiological Signals),this paper presents deep learning(DL)technique for effectively detecting human stress.The combination of vision-based and sensor-based approaches for recognizing human stress will help us achieve the increased efficiency of current stress recognition systems and predict probable actions in advance of when fatal.Based on visual and EEG(Electroencephalogram)data,this research aims to enhance the performance and extract the dominating characteristics of stress detection.For the stress identification test,we utilized the DEAP dataset,which included video and EEG data.We also demonstrate that combining video and EEG characteristics may increase overall performance,with the suggested stochastic features providing the most accurate results.In the first step,CNN(Convolutional Neural Network)extracts feature vectors from video frames and EEG data.Feature Level(FL)fusion that combines the features extracted from video and EEG data.We use XGBoost as our classifier model to predict stress,and we put it into action.The stress recognition accuracy of the proposed method is compared to existing methods of Decision Tree(DT),Random Forest(RF),AdaBoost,Linear Discriminant Analysis(LDA),and KNearest Neighborhood(KNN).When we compared our technique to existing state-of-the-art approaches,we found that the suggested DL methodology combining multimodal and heterogeneous inputs may improve stress identification.
文摘Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have learned,and are continuing to discover,about the many social,structural,and environmental determinants of health.We must fully consider the conditions in which people are born,grow,learn,work,play,and age.The study of social determinants of health has made tremendous strides in recent decades.At the same time,we have seen huge advances in how health data are collected,analyzed,and used to inform action in the health sector.It is time to merge these two fields,to harness the best from both and to improve decision-making to accelerate evidence-based action toward greater health equity.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
文摘Virtual Machines are the core of cloud computing and are utilized toget the benefits of cloud computing. Other essential features include portability,recovery after failure, and, most importantly, creating the core mechanismfor load balancing. Several study results have been reported in enhancing loadbalancingsystems employing stochastic or biogenetic optimization methods.It examines the underlying issues with load balancing and the limitationsof present load balance genetic optimization approaches. They are criticizedfor using higher-order probability distributions, more complicated solutionsearch spaces, and adding factors to improve decision-making skills. Thus, thispaper explores the possibility of summarizing load characteristics. Second,this study offers an improved prediction technique for pheromone level predictionover other typical genetic optimization methods during load balancing.It also uses web-based third-party cloud service providers to test and validatethe principles provided in this study. It also reduces VM migrations, timecomplexity, and service level agreements compared to other parallel standardapproaches.
基金supported by Beijing Ronghe Medical Development Foundation。
文摘This study aims to apply ResNet-18 convolutional neural network(CNN)and XGBoost to preoperative computed tomography(CT)images and clinical data for distinguishing Xp11.2 translocation renal cell carcinoma(Xp11.2 tRCC)from common subtypes of renal cell carcinoma(RCC)in order to provide patients with individualized treatment plans.Data from45 patients with Xp11.2 tRCC fromJanuary 2007 to December 2021 are collected.Clear cell RCC(ccRCC),papillary RCC(pRCC),or chromophobe RCC(chRCC)can be detected from each patient.CT images are acquired in the following three phases:unenhanced,corticomedullary,and nephrographic.A unified framework is proposed for the classification of renal masses.In this framework,ResNet-18 CNN is employed to classify renal cancers with CT images,while XGBoost is adopted with clinical data.Experiments demonstrate that,if applying ResNet-18 CNN or XGBoost singly,the latter outperforms the former,while the framework integrating both technologies performs similarly or better than urologists.Especially,the possibility of misclassifying Xp11.2 tRCC,pRCC,and chRCC as ccRCC by the proposed framework is much lower than urologists.
基金supported in part by the Natural Science Foundation of China under Grants 62063004the Key Research Project of Hainan Province under Grant ZDYF2021SHFZ093+1 种基金the Hainan Provincial Natural Science Foundation of China under Grants 2019RC018 and 619QN246the postdoctor research from Zhejiang Province under Grant ZJ2021028.
文摘The amount of 3D data stored and transmitted in the Internet of Medical Things(IoMT)is increasing,making protecting these medical data increasingly prominent.However,there are relatively few researches on 3D data watermarking.Moreover,due to the particularity of medical data,strict data quality should be considered while protecting data security.To solve the problem,in the field of medical volume data,we proposed a robust watermarking algorithm based on Polar Cosine Transform and 3D-Discrete Cosine Transform(PCT and 3D-DCT).Each slice of the volume data was transformed by PCT to obtain feature row vector,and then the reshaped three-dimensional feature matrix was transformed by 3D-DCT.Based on the contour information of the volume data and the detail information of the inner slice,the visual feature vector was obtained by applying the per-ceptual hash.In addition,the watermark was encrypted by a multi-sensitive initial value Sine and Piecewise linear chaotic Mapping(SPM)system,and embedded as a zero watermark.The key was stored in a third party.Under the same experimental conditions,when the volume data is rotated by 80 degrees,cut 25%along the Z axis,and the JPEG compression quality is 1%,the Normalized Correlation Coefficient(NC)of the extracted watermark is 0.80,0.89,and 1.00 respectively,which are significantly higher than the comparison algorithm.
文摘With the advancements in the era of artificial intelligence,blockchain,cloud computing,and big data,there is a need for secure,decentralized medical record storage and retrieval systems.While cloud storage solves storage issues,it is challenging to realize secure sharing of records over the network.Medi-block record in the healthcare system has brought a new digitalization method for patients’medical records.This centralized technology provides a symmetrical process between the hospital and doctors when patients urgently need to go to a different or nearby hospital.It enables electronic medical records to be available with the correct authentication and restricts access to medical data retrieval.Medi-block record is the consumer-centered healthcare data system that brings reliable and transparent datasets for the medical record.This study presents an extensive review of proposed solutions aiming to protect the privacy and integrity of medical data by securing data sharing for Medi-block records.It also aims to propose a comprehensive investigation of the recent advances in different methods of securing data sharing,such as using Blockchain technology,Access Control,Privacy-Preserving,Proxy Re-Encryption,and Service-On-Chain approach.Finally,we highlight the open issues and identify the challenges regarding secure data sharing for Medi-block records in the healthcare systems.
基金supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),funded by the Ministry of Health&Welfare,Republic of Korea(Grant Number:HI21C1831)the Soonchunhyang University Research Fund.
文摘Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it.Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset.Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool.With this motivation,this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics(MR-HDBCC).The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data.In addition,the MR-HDBCC technique involves three distinct processes namely pre-processing,clustering,and classification.The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise(DBSCAN)techni-que which is capable of detecting random shapes and diverse clusters with noisy data.For improving the performance of the DBSCAN technique,a hybrid model using cockroach swarm optimization(CSO)algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering.Finally,bidirectional gated recurrent neural network(BGRNN)is employed for the classification of big data.The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.
基金supported under the research Grant(PO Number:920138936)from the Institute of Technology PETRONAS Sdn Bhd,32610,Bandar Seri Iskandar,Perak,Malaysia.
文摘Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate.The common approach to han-dle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling,random oversampling,or Synthetic Minority Oversampling Technique(SMOTE)algorithms.This paper compared the classification performance of three popular classifiers(Logistic Regression,Gaussian Naïve Bayes,and Support Vector Machine)in predicting machine failure in the Oil and Gas industry.The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945(97%)‘non-failure’and 528(3%)‘failure data’.The three independent variables to predict machine failure were pressure indicator,flow indicator,and level indicator.The accuracy of the classifiers is very high and close to 100%,but the sensitivity of all classifiers using the original dataset was close to zero.The performance of the three classifiers was then evaluated for data with different imbalance rates(10%to 50%)generated from the original data using SMOTE,SMOTE-Support Vector Machine(SMOTE-SVM)and SMOTE-Edited Nearest Neighbour(SMOTE-ENN).The classifiers were evaluated based on improvement in sensitivity and F-measure.Results showed that the sensitivity of all classifiers increases as the imbalance rate increases.SVM with radial basis function(RBF)kernel has the highest sensitivity when data is balanced(50:50)using SMOTE(Sensitivitytest=0.5686,Ftest=0.6927)compared to Naïve Bayes(Sensitivitytest=0.4033,Ftest=0.6218)and Logistic Regression(Sensitivitytest=0.4194,Ftest=0.621).Overall,the Gaussian Naïve Bayes model consistently improves sensitivity and F-measure as the imbalance ratio increases,but the sensitivity is below 50%.The classifiers performed better when data was balanced using SMOTE-SVM compared to SMOTE and SMOTE-ENN.
基金financed in part by the Coordenacao de Aperfeicoamento de Pessoal de Nível Superior-Brazil(CAPES)-Finance Code 001the Spanish Ministry of Science and Innovation under grants PID2019-105381 GA-100(iScience).
文摘Big data is a concept that deals with large or complex data sets by using data analysis tools(e.g.,data mining,machine learning)to analyze information extracted from several sources systematically.Big data has attracted wide attention from academia,for example,in supporting patients and health professionals by improving the accuracy of decision-making,diagnosis and disease prediction.This research aimed to perform a Bibliometric Performance and Network Analysis(BPNA)supported by a Scoping Review(SR)to depict the strategic themes,thematic evolution structure,main challenges and opportunities related to the concept of big data applied in the healthcare sector.With this goal in mind,4857 documents from the Web of Science covering the period between 2009 to June 2020 were analyzed with the support of SciMAT software.The bibliometric performance showed the number of publications and citations over time,scientific productivity and the geographic distribution of publications and research fields.The strategic diagram yielded 20 clusters and their relative importance in terms of centrality and density.The thematic evolution structure presented the most important themes and how it changes over time.Lastly,we presented the main challenges and future opportunities of big data in healthcare.