With the rapid development and widespread application of Big Data technology, the supply chain management of agricultural products enterprises is facing unprecedented reform and challenges. This study takes the perspe...With the rapid development and widespread application of Big Data technology, the supply chain management of agricultural products enterprises is facing unprecedented reform and challenges. This study takes the perspective of Big Data technology and collects relevant information on the application of supply chain management in 100 agricultural product enterprises through a survey questionnaire. The study found that the use of Big Data can effectively improve the accuracy of demand forecasting, inventory management efficiency, optimize logistics costs, improve supplier management efficiency, enhance the overall level of supply chain management of enterprises, and propose innovative strategies for supply chain management of agricultural products enterprises based on this. Big Data technology brings a new solution for agricultural products enterprises to enhance their supply chain management level, making the supply chain smarter and more efficient.展开更多
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ...When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.展开更多
Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp...Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classif...In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.展开更多
Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not ...Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not resilient to network updates that provoke flow rerouting.In this paper,we first demonstrate that popular TCP implementations perform inadequately in the presence of frequent and inconsistent network updates,because inconsistent and frequent network updates result in out-of-order packets and packet drops induced via transitory congestion and lead to serious performance deterioration.We look into the causes and propose a network update-friendly TCP(NUFTCP),which is an extension of the DCTCP variant,as a solution.Simulations are used to assess the proposed NUFTCP.Our findings reveal that NUFTCP can more effectively manage the problems of out-of-order packets and packet drops triggered in network updates,and it outperforms DCTCP considerably.展开更多
Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify th...Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.展开更多
Various approaches have been proposed to minimize the upper-level systematic biases in global numerical weather prediction(NWP)models by using satellite upper-air sounding channels as anchors.However,since the China M...Various approaches have been proposed to minimize the upper-level systematic biases in global numerical weather prediction(NWP)models by using satellite upper-air sounding channels as anchors.However,since the China Meteorological Administration Global Forecast System(CMA-GFS)has a model top near 0.1 hPa(60 km),the upper-level temperature bias may exceed 4 K near 1 hPa and further extend to 5 hPa.In this study,channels 12–14 of the Advanced Microwave Sounding Unit A(AMSU-A)onboard five satellites of NOAA and METOP,whose weighting function peaks range from 10 to 2 hPa are all used as anchor observations in CMA-GFS.It is shown that the new“Anchor”approach can effectively reduce the biases near the model top and their downward propagation in three-month assimilation cycles.The bias growth rate of simulated upper-level channel observations is reduced to±0.001 K d^(–1),compared to–0.03 K d^(–1)derived from the current dynamic correction scheme.The relatively stable bias significantly improves the upper-level analysis field and leads to better global medium-range forecasts up to 10 days with significant reductions in the temperature and geopotential forecast error above 10 hPa.展开更多
Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Sma...Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
That the world is a global village is no longer news through the tremendous advancement in the Information Communication Technology (ICT). The metamorphosis of the human data storage and analysis from analogue through...That the world is a global village is no longer news through the tremendous advancement in the Information Communication Technology (ICT). The metamorphosis of the human data storage and analysis from analogue through the jaguars-loom mainframe computer to the present modern high power processing computers with sextillion bytes storage capacity has prompted discussion of Big Data concept as a tool in managing hitherto all human challenges of complex human system multiplier effects. The supply chain management (SCM) that deals with spatial service delivery that must be safe, efficient, reliable, cheap, transparent, and foreseeable to meet customers’ needs cannot but employ bid data tools in its operation. This study employs secondary data online to review the importance of big data in supply chain management and the levels of adoption in Nigeria. The study revealed that the application of big data tools in SCM and other industrial sectors is synonymous to human and national development. It is therefore recommended that both private and governmental bodies should key into e-transactions for easy data assemblage and analysis for profitable forecasting and policy formation.展开更多
Hainan is a major tourist province.It is urgent to promote the transformation and upgrading of Hainan’s tourism industry from a traditional service industry to a modern service industry by means of informatization.Sm...Hainan is a major tourist province.It is urgent to promote the transformation and upgrading of Hainan’s tourism industry from a traditional service industry to a modern service industry by means of informatization.Smart tourism is a brand-new tourism form and operation mode of tourism transformation and upgrading.Integrating big data technology will make smart tourism more accurate in three aspects:tourism management,tourism service,and tourism marketing,and further enhance the satisfaction of the tourism experience.This paper studies the development status of smart tourism in Hainan,deeply summarizes its existing problems and causes,and puts forward the development strategy of smart tourism in Hainan to promote the healthy development of the tourism industry in Hainan.展开更多
Driven by the wave of big data,the traditional financial accounting model faces an urgent need for transformation,as it struggles to adapt to the complex requirements of modern enterprise management.This paper aims to...Driven by the wave of big data,the traditional financial accounting model faces an urgent need for transformation,as it struggles to adapt to the complex requirements of modern enterprise management.This paper aims to explore the feasible path for transitioning enterprise financial accounting to management accounting in the context of big data.It first analyzes the limitations of financial accounting in the era of big data,then highlights the necessity of transitioning to management accounting.Following this,the paper outlines the various challenges that may arise during this transition and,based on the analysis,proposes a series of corresponding transition strategies.These strategies aim to provide theoretical support and practical guidance for enterprises seeking a smooth transition from financial accounting to management accounting.展开更多
Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,whi...Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method.展开更多
Progress in cloud computing makes group data sharing in outsourced storage a reality.People join in group and share data with each other,making team work more convenient.This new application scenario also faces data s...Progress in cloud computing makes group data sharing in outsourced storage a reality.People join in group and share data with each other,making team work more convenient.This new application scenario also faces data security threats,even more complex.When a user quit its group,remaining data block signatures must be re-signed to ensure security.Some researchers noticed this problem and proposed a few works to relieve computing overhead on user side.However,considering the privacy and security need of group auditing,there still lacks a comprehensive solution to implement secure group user revocation,supporting identity privacy preserving and collusion attack resistance.Aiming at this target,we construct a concrete scheme based on ring signature and smart contracts.We introduce linkable ring signature to build a kind of novel meta data for integrity proof enabling anonymous verification.And the new meta data supports secure revocation.Meanwhile,smart contracts are using for resisting possible collusion attack and malicious re-signing computation.Under the combined effectiveness of both signature method and blockchain smart contracts,our proposal supports reliable user revocation and signature re-signing,without revealing any user identity in the whole process.Security and performance analysis compared with previous works prove that the proposed scheme is feasible and efficient.展开更多
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni...In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.展开更多
An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when exp...An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when expressing their somatic symptoms and complaints to their non-psychiatrist doctor. On the other hand, clinicians usually lack the required expertise(or time) and have a deficit in mining various verbal and non-verbal emotional signals of the patients. As a result, in many cases, there is an emotion recognition barrier between the clinician and the patients making all patients seem the same except for their different somatic symptoms. In particular, we aim to identify and combine three major disciplines(psychology, linguistics, and data science) approaches for detecting emotions from verbal communication and propose an integrated solution for emotion recognition support. Such a platform may give emotional guides and indices to the clinician based on verbal communication at the consultation time.展开更多
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi...With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.展开更多
文摘With the rapid development and widespread application of Big Data technology, the supply chain management of agricultural products enterprises is facing unprecedented reform and challenges. This study takes the perspective of Big Data technology and collects relevant information on the application of supply chain management in 100 agricultural product enterprises through a survey questionnaire. The study found that the use of Big Data can effectively improve the accuracy of demand forecasting, inventory management efficiency, optimize logistics costs, improve supplier management efficiency, enhance the overall level of supply chain management of enterprises, and propose innovative strategies for supply chain management of agricultural products enterprises based on this. Big Data technology brings a new solution for agricultural products enterprises to enhance their supply chain management level, making the supply chain smarter and more efficient.
基金supported by the Yunnan Major Scientific and Technological Projects(Grant No.202302AD080001)the National Natural Science Foundation,China(No.52065033).
文摘When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
基金supported by NOAA JTTI award via Grant #NA21OAR4590165, NOAA GOESR Program funding via Grant #NA16OAR4320115provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA-University of Oklahoma Cooperative Agreement #NA11OAR4320072, U.S. Department of Commercesupported by the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce via Grant #NA18NWS4680063。
文摘Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
基金supported by National Natural Science Foundation of China(62371098)Natural Science Foundation of Sichuan Province(2023NSFSC1422)+1 种基金National Key Research and Development Program of China(2021YFB2900404)Central Universities of South west Minzu University(ZYN2022032).
文摘In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.
基金supportted by the King Khalid University through the Large Group Project(No.RGP.2/312/44).
文摘Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not resilient to network updates that provoke flow rerouting.In this paper,we first demonstrate that popular TCP implementations perform inadequately in the presence of frequent and inconsistent network updates,because inconsistent and frequent network updates result in out-of-order packets and packet drops induced via transitory congestion and lead to serious performance deterioration.We look into the causes and propose a network update-friendly TCP(NUFTCP),which is an extension of the DCTCP variant,as a solution.Simulations are used to assess the proposed NUFTCP.Our findings reveal that NUFTCP can more effectively manage the problems of out-of-order packets and packet drops triggered in network updates,and it outperforms DCTCP considerably.
基金supported by the National Natural Science Foundation of China(Nos.72204169 and 81825007)Beijing Outstanding Young Scientist Program(No.BJJWZYJH01201910025030)+5 种基金Capital’s Funds for Health Improvement and Research(No.2022-2-2045)National Key R&D Program of China(Nos.2022YFF15015002022YFF1501501,2022YFF1501502,2022YFF1501503,2022YFF1501504,and 2022YFF1501505)Youth Beijing Scholar Program(No.010)Beijing Laboratory of Oral Health(No.PXM2021_014226_000041)Beijing Talent Project-Class A:Innovation and Development(No.2018A12)National Ten-Thousand Talent PlanLeadership of Scientific and Technological Innovation,and National Key R&D Program of China(Nos.2017YFC1307900 and 2017YFC1307905).
文摘Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.
基金supported by the Hunan Provincial Natural Science Foundation of China(Grant No.2021JC0009)the Natural Science Foundation of China(Grant Nos.U2142212 and 42105136)。
文摘Various approaches have been proposed to minimize the upper-level systematic biases in global numerical weather prediction(NWP)models by using satellite upper-air sounding channels as anchors.However,since the China Meteorological Administration Global Forecast System(CMA-GFS)has a model top near 0.1 hPa(60 km),the upper-level temperature bias may exceed 4 K near 1 hPa and further extend to 5 hPa.In this study,channels 12–14 of the Advanced Microwave Sounding Unit A(AMSU-A)onboard five satellites of NOAA and METOP,whose weighting function peaks range from 10 to 2 hPa are all used as anchor observations in CMA-GFS.It is shown that the new“Anchor”approach can effectively reduce the biases near the model top and their downward propagation in three-month assimilation cycles.The bias growth rate of simulated upper-level channel observations is reduced to±0.001 K d^(–1),compared to–0.03 K d^(–1)derived from the current dynamic correction scheme.The relatively stable bias significantly improves the upper-level analysis field and leads to better global medium-range forecasts up to 10 days with significant reductions in the temperature and geopotential forecast error above 10 hPa.
文摘Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘That the world is a global village is no longer news through the tremendous advancement in the Information Communication Technology (ICT). The metamorphosis of the human data storage and analysis from analogue through the jaguars-loom mainframe computer to the present modern high power processing computers with sextillion bytes storage capacity has prompted discussion of Big Data concept as a tool in managing hitherto all human challenges of complex human system multiplier effects. The supply chain management (SCM) that deals with spatial service delivery that must be safe, efficient, reliable, cheap, transparent, and foreseeable to meet customers’ needs cannot but employ bid data tools in its operation. This study employs secondary data online to review the importance of big data in supply chain management and the levels of adoption in Nigeria. The study revealed that the application of big data tools in SCM and other industrial sectors is synonymous to human and national development. It is therefore recommended that both private and governmental bodies should key into e-transactions for easy data assemblage and analysis for profitable forecasting and policy formation.
文摘Hainan is a major tourist province.It is urgent to promote the transformation and upgrading of Hainan’s tourism industry from a traditional service industry to a modern service industry by means of informatization.Smart tourism is a brand-new tourism form and operation mode of tourism transformation and upgrading.Integrating big data technology will make smart tourism more accurate in three aspects:tourism management,tourism service,and tourism marketing,and further enhance the satisfaction of the tourism experience.This paper studies the development status of smart tourism in Hainan,deeply summarizes its existing problems and causes,and puts forward the development strategy of smart tourism in Hainan to promote the healthy development of the tourism industry in Hainan.
文摘Driven by the wave of big data,the traditional financial accounting model faces an urgent need for transformation,as it struggles to adapt to the complex requirements of modern enterprise management.This paper aims to explore the feasible path for transitioning enterprise financial accounting to management accounting in the context of big data.It first analyzes the limitations of financial accounting in the era of big data,then highlights the necessity of transitioning to management accounting.Following this,the paper outlines the various challenges that may arise during this transition and,based on the analysis,proposes a series of corresponding transition strategies.These strategies aim to provide theoretical support and practical guidance for enterprises seeking a smooth transition from financial accounting to management accounting.
基金supported in part by the 14th Five-Year Project of Ministry of Science and Technology of China(2021YFD2000304)Fundamental Research Funds for the Central Universities(531118010509)Natural Science Foundation of Hunan Province,China(2021JJ40114)。
文摘Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method.
基金The work is supported by the National Key Research and Development Program of China(No.2018YFC1604002)the National Natural Science Foundation of China(No.U1836204,No.U1936208,No.U1936216,No.62002197).
文摘Progress in cloud computing makes group data sharing in outsourced storage a reality.People join in group and share data with each other,making team work more convenient.This new application scenario also faces data security threats,even more complex.When a user quit its group,remaining data block signatures must be re-signed to ensure security.Some researchers noticed this problem and proposed a few works to relieve computing overhead on user side.However,considering the privacy and security need of group auditing,there still lacks a comprehensive solution to implement secure group user revocation,supporting identity privacy preserving and collusion attack resistance.Aiming at this target,we construct a concrete scheme based on ring signature and smart contracts.We introduce linkable ring signature to build a kind of novel meta data for integrity proof enabling anonymous verification.And the new meta data supports secure revocation.Meanwhile,smart contracts are using for resisting possible collusion attack and malicious re-signing computation.Under the combined effectiveness of both signature method and blockchain smart contracts,our proposal supports reliable user revocation and signature re-signing,without revealing any user identity in the whole process.Security and performance analysis compared with previous works prove that the proposed scheme is feasible and efficient.
基金supported in part by the National Key Research and Development Program of China(2021YFB1714700)the National Natural Science Foundation of China(62022061,6192100028)。
文摘In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.
文摘An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when expressing their somatic symptoms and complaints to their non-psychiatrist doctor. On the other hand, clinicians usually lack the required expertise(or time) and have a deficit in mining various verbal and non-verbal emotional signals of the patients. As a result, in many cases, there is an emotion recognition barrier between the clinician and the patients making all patients seem the same except for their different somatic symptoms. In particular, we aim to identify and combine three major disciplines(psychology, linguistics, and data science) approaches for detecting emotions from verbal communication and propose an integrated solution for emotion recognition support. Such a platform may give emotional guides and indices to the clinician based on verbal communication at the consultation time.
基金supported by China’s National Natural Science Foundation(Nos.62072249,62072056)This work is also funded by the National Science Foundation of Hunan Province(2020JJ2029).
文摘With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.