The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), ha...The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), has decreased dramatically in the past decades due to climate change and human activity, which may have influenced its ecological functions. To restore its ecological functions, reasonable reforestation is the key measure. Many previous efforts have predicted the potential distribution of Picea crassifolia, which provides guidance on regional reforestation policy. However, all of them were performed at low spatial resolution, thus ignoring the natural characteristics of the patchy distribution of Picea crassifolia. Here, we modeled the distribution of Picea crassifolia with species distribution models at high spatial resolutions. For many models, the area under the receiver operating characteristic curve (AUC) is larger than 0.9, suggesting their excellent precision. The AUC of models at 30 m is higher than that of models at 90 m, and the current potential distribution of Picea crassifolia is more closely aligned with its actual distribution at 30 m, demonstrating that finer data resolution improves model performance. Besides, for models at 90 m resolution, annual precipitation (Bio12) played the paramount influence on the distribution of Picea crassifolia, while the aspect became the most important one at 30 m, indicating the crucial role of finer topographic data in modeling species with patchy distribution. The current distribution of Picea crassifolia was concentrated in the northern and central parts of the study area, and this pattern will be maintained under future scenarios, although some habitat loss in the central parts and gain in the eastern regions is expected owing to increasing temperatures and precipitation. Our findings can guide protective and restoration strategies for the Qilian Mountains, which would benefit regional ecological balance.展开更多
The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are ca...The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated alon...In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.展开更多
The fitting of lifetime distribution in real-life data has been studied in various fields of research. With the theory of evolution still applicable, more complex data from real-world scenarios will continue to emerge...The fitting of lifetime distribution in real-life data has been studied in various fields of research. With the theory of evolution still applicable, more complex data from real-world scenarios will continue to emerge. Despite this, many researchers have made commendable efforts to develop new lifetime distributions that can fit this complex data. In this paper, we utilized the KM-transformation technique to increase the flexibility of the power Lindley distribution, resulting in the Kavya-Manoharan Power Lindley (KMPL) distribution. We study the mathematical treatments of the KMPL distribution in detail and adapt the widely used method of maximum likelihood to estimate the unknown parameters of the KMPL distribution. We carry out a Monte Carlo simulation study to investigate the performance of the Maximum Likelihood Estimates (MLEs) of the parameters of the KMPL distribution. To demonstrate the effectiveness of the KMPL distribution for data fitting, we use a real dataset comprising the waiting time of 100 bank customers. We compare the KMPL distribution with other models that are extensions of the power Lindley distribution. Based on some statistical model selection criteria, the summary results of the analysis were in favor of the KMPL distribution. We further investigate the density fit and probability-probability (p-p) plots to validate the superiority of the KMPL distribution over the competing distributions for fitting the waiting time dataset.展开更多
Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of dat...Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.展开更多
Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable opera...Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.展开更多
Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-cal...Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-called“Topp-Leone strategy”,aiming to improve its overall flexibility by adding a shape parameter.The main objective is to offer original distributions with modifiable properties,from which adaptive and pliant statistical models can be derived.For the new family,these aspects are illustrated by the means of comprehensive mathematical and numerical results.In particular,we emphasize a special distribution with three parameters based on the exponential distribution.The related model is shown to be skillful to the fitting of various lifetime data,more or less heterogeneous.Among all the possible applications,we consider two data sets of current interest,linked to the COVID-19 pandemic.They concern daily cases confirmed and recovered in Pakistan from March 24 to April 28,2020.As a result of our analyzes,the proposed model has the best fitting results in comparison to serious challengers,including the former odd Fréchet model.展开更多
To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while r...To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.展开更多
数据分发服务(Data distribution service,DDS)是一种可靠的实时数据通信中间件标准,它是面向基于发布/订阅模型的分布式环境,在各个领域得到了广泛应用,但现有研究涉及DDS安全技术的成果较少,而在实际应用中发布订阅系统存在多种安全...数据分发服务(Data distribution service,DDS)是一种可靠的实时数据通信中间件标准,它是面向基于发布/订阅模型的分布式环境,在各个领域得到了广泛应用,但现有研究涉及DDS安全技术的成果较少,而在实际应用中发布订阅系统存在多种安全威胁。为了建立灵活可靠的安全机制来确保发布订阅信息的安全性,提出一种以数据为中心的访问控制方案。在属性加密的基础上,对访问树结构进行优化处理,结合发布订阅环境增加属性信任机制。之后采用制定属性连接式与授权策略的方式对发布订阅信息进行加密匹配,并建立DDS访问控制模型来控制发布订阅系统内信息的交互,实现数据的安全分发。经过实验验证,该方案既能够应对DDS存在的几种安全威胁,保障发布订阅信息的机密性,也能够实现系统对特定信息的访问控制,并且发布者订阅者不需要共享密钥,减少了密钥管理的开销。展开更多
The existence of three well-defined tongue-shaped zones of swell dominance,termed as 'swell pools',in the Pacific,the Atlantic and the Indian Oceans,was reported by Chen et al.(2002)using satellite data.In thi...The existence of three well-defined tongue-shaped zones of swell dominance,termed as 'swell pools',in the Pacific,the Atlantic and the Indian Oceans,was reported by Chen et al.(2002)using satellite data.In this paper,the ECMWF Re-analyses wind wave data,including wind speed,significant wave height,averaged wave period and direction,are applied to verify the existence of these swell pools.The swell indices calculated from wave height,wave age and correlation coefficient are used to identify swell events.The wave age swell index can be more appropriately related to physical processes compared to the other two swell indices.Based on the ECMWF data the swell pools in the Pacific and the Atlantic Oceans are confirmed,but the expected swell pool in the Indian Ocean is not pronounced.The seasonal variations of global and hemispherical swell indices are investigated,and the argument that swells in the pools seemed to originate mostly from the winter hemisphere is supported by the seasonal variation of the averaged wave direction.The northward bending of the swell pools in the Pacific and the Atlantic Oceans in summer is not revealed by the ECMWF data.The swell pool in the Indian Ocean and the summer northward bending of the swell pools in the Pacific and the Atlan-tic Oceans need to be further verified by other datasets.展开更多
Net Primary Productivity (NPP) is one of the important biophysical variables of vegetation activity, and it plays an important role in studying global carbon cycle, carbon source and sink of ecosystem, and spatial a...Net Primary Productivity (NPP) is one of the important biophysical variables of vegetation activity, and it plays an important role in studying global carbon cycle, carbon source and sink of ecosystem, and spatial and temporal distribution of CO2. Remote sensing can provide broad view quickly, timely and multi-temporally, which makes it an attractive and powerful tool for studying ecosystem primary productivity, at scales ranging from local to global. This paper aims to use Moderate Resolution Imaging Spectroradiometer (MODIS) data to estimate and analyze spatial and temporal distribution of NPP of the northern Hebei Province in 2001 based on Carnegie-Ames-Stanford Approach (CASA) model. The spatial distribution of Absorbed Photosynthetically Active Radiation (APAR) of vegetation and light use efficiency in three geographical subregions, that is, Bashang Plateau Region, Basin Region in the northwestern Hebei Province and Yanshan Mountainous Region in the Northern Hebei Province were analyzed, and total NPP spatial distribution of the study area in 2001 was discussed. Based on 16-day MODIS Fraction of Photosynthetically Active Radiation absorbed by vegetation (FPAR) product, 16-day composite NPP dynamics were calculated using CASA model; the seasonal dynamics of vegetation NPP in three subreglons were also analyzed. Result reveals that the total NPP of the study area in 2001 was 25.1877 × 10^6gC/(m^2.a), and NPP in 2001 ranged from 2 to 608gC/(m^2-a), with an average of 337.516gC/(m^2.a). NPP of the study area in 2001 accumulated mainly from May to September (DOY 129-272), high NIP values appeared from June to August (DOY 177-204), and the maximum NPP appeared from late July to mid-August (DOY 209-224).展开更多
In this article,we highlight a new three-parameter heavy-tailed lifetime distribution that aims to extend the modeling possibilities of the Lomax distribution.It is called the extended Lomax distribution.The considere...In this article,we highlight a new three-parameter heavy-tailed lifetime distribution that aims to extend the modeling possibilities of the Lomax distribution.It is called the extended Lomax distribution.The considered distribution naturally appears as the distribution of a transformation of a random variable following the logweighted power distribution recently introduced for percentage or proportion data analysis purposes.As a result,its cumulative distribution has the same functional basis as that of the Lomax distribution,but with a novel special logarithmic term depending on several parameters.The modulation of this logarithmic term reveals new types of asymetrical shapes,implying a modeling horizon beyond that of the Lomax distribution.In the first part,we examine several of its mathematical properties,such as the shapes of the related probability and hazard rate functions;stochastic comparisons;manageable expansions for various moments;and quantile properties.In particular,based on the quantile functions,various actuarial measures are discussed.In the second part,the distribution’s applicability is investigated with the use of themaximumlikelihood estimationmethod.The behavior of the obtained parameter estimates is validated by a simulation work.Insurance claim data are analyzed.We show that the proposed distribution outperforms eight well-known distributions,including the Lomax distribution and several extended Lomax distributions.In addition,we demonstrate that it gives preferable inferences from these competitor distributions in terms of risk measures.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with usin...A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.展开更多
基金supported by the National Natural Science Foundation of China(No.42071057).
文摘The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), has decreased dramatically in the past decades due to climate change and human activity, which may have influenced its ecological functions. To restore its ecological functions, reasonable reforestation is the key measure. Many previous efforts have predicted the potential distribution of Picea crassifolia, which provides guidance on regional reforestation policy. However, all of them were performed at low spatial resolution, thus ignoring the natural characteristics of the patchy distribution of Picea crassifolia. Here, we modeled the distribution of Picea crassifolia with species distribution models at high spatial resolutions. For many models, the area under the receiver operating characteristic curve (AUC) is larger than 0.9, suggesting their excellent precision. The AUC of models at 30 m is higher than that of models at 90 m, and the current potential distribution of Picea crassifolia is more closely aligned with its actual distribution at 30 m, demonstrating that finer data resolution improves model performance. Besides, for models at 90 m resolution, annual precipitation (Bio12) played the paramount influence on the distribution of Picea crassifolia, while the aspect became the most important one at 30 m, indicating the crucial role of finer topographic data in modeling species with patchy distribution. The current distribution of Picea crassifolia was concentrated in the northern and central parts of the study area, and this pattern will be maintained under future scenarios, although some habitat loss in the central parts and gain in the eastern regions is expected owing to increasing temperatures and precipitation. Our findings can guide protective and restoration strategies for the Qilian Mountains, which would benefit regional ecological balance.
基金supported in part by the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2022C03174)the National Natural Science Foundation of China(No.92067103)+4 种基金the Key Research and Development Program of Shaanxi,China(No.2021ZDLGY06-02)the Natural Science Foundation of Shaanxi Province(No.2019ZDLGY12-02)the Shaanxi Innovation Team Project(No.2018TD-007)the Xi'an Science and technology Innovation Plan(No.201809168CX9JC10)the Fundamental Research Funds for the Central Universities(No.YJS2212)and National 111 Program of China B16037.
文摘The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.
文摘The fitting of lifetime distribution in real-life data has been studied in various fields of research. With the theory of evolution still applicable, more complex data from real-world scenarios will continue to emerge. Despite this, many researchers have made commendable efforts to develop new lifetime distributions that can fit this complex data. In this paper, we utilized the KM-transformation technique to increase the flexibility of the power Lindley distribution, resulting in the Kavya-Manoharan Power Lindley (KMPL) distribution. We study the mathematical treatments of the KMPL distribution in detail and adapt the widely used method of maximum likelihood to estimate the unknown parameters of the KMPL distribution. We carry out a Monte Carlo simulation study to investigate the performance of the Maximum Likelihood Estimates (MLEs) of the parameters of the KMPL distribution. To demonstrate the effectiveness of the KMPL distribution for data fitting, we use a real dataset comprising the waiting time of 100 bank customers. We compare the KMPL distribution with other models that are extensions of the power Lindley distribution. Based on some statistical model selection criteria, the summary results of the analysis were in favor of the KMPL distribution. We further investigate the density fit and probability-probability (p-p) plots to validate the superiority of the KMPL distribution over the competing distributions for fitting the waiting time dataset.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.
文摘Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.
基金This work was funded by the Deanship of Scientific Research(DSR),King AbdulAziz University,Jeddah,under grant No.(G:550-247-1441).
文摘Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-called“Topp-Leone strategy”,aiming to improve its overall flexibility by adding a shape parameter.The main objective is to offer original distributions with modifiable properties,from which adaptive and pliant statistical models can be derived.For the new family,these aspects are illustrated by the means of comprehensive mathematical and numerical results.In particular,we emphasize a special distribution with three parameters based on the exponential distribution.The related model is shown to be skillful to the fitting of various lifetime data,more or less heterogeneous.Among all the possible applications,we consider two data sets of current interest,linked to the COVID-19 pandemic.They concern daily cases confirmed and recovered in Pakistan from March 24 to April 28,2020.As a result of our analyzes,the proposed model has the best fitting results in comparison to serious challengers,including the former odd Fréchet model.
基金The National Key Basic Research Program of China(973 Program)
文摘To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.
文摘数据分发服务(Data distribution service,DDS)是一种可靠的实时数据通信中间件标准,它是面向基于发布/订阅模型的分布式环境,在各个领域得到了广泛应用,但现有研究涉及DDS安全技术的成果较少,而在实际应用中发布订阅系统存在多种安全威胁。为了建立灵活可靠的安全机制来确保发布订阅信息的安全性,提出一种以数据为中心的访问控制方案。在属性加密的基础上,对访问树结构进行优化处理,结合发布订阅环境增加属性信任机制。之后采用制定属性连接式与授权策略的方式对发布订阅信息进行加密匹配,并建立DDS访问控制模型来控制发布订阅系统内信息的交互,实现数据的安全分发。经过实验验证,该方案既能够应对DDS存在的几种安全威胁,保障发布订阅信息的机密性,也能够实现系统对特定信息的访问控制,并且发布者订阅者不需要共享密钥,减少了密钥管理的开销。
基金the National Natural Science Foundation of China (Nos. 40830959 and 40921004)the Ministry of Science and Technology of China (No. 2011BAC03B01)
文摘The existence of three well-defined tongue-shaped zones of swell dominance,termed as 'swell pools',in the Pacific,the Atlantic and the Indian Oceans,was reported by Chen et al.(2002)using satellite data.In this paper,the ECMWF Re-analyses wind wave data,including wind speed,significant wave height,averaged wave period and direction,are applied to verify the existence of these swell pools.The swell indices calculated from wave height,wave age and correlation coefficient are used to identify swell events.The wave age swell index can be more appropriately related to physical processes compared to the other two swell indices.Based on the ECMWF data the swell pools in the Pacific and the Atlantic Oceans are confirmed,but the expected swell pool in the Indian Ocean is not pronounced.The seasonal variations of global and hemispherical swell indices are investigated,and the argument that swells in the pools seemed to originate mostly from the winter hemisphere is supported by the seasonal variation of the averaged wave direction.The northward bending of the swell pools in the Pacific and the Atlantic Oceans in summer is not revealed by the ECMWF data.The swell pool in the Indian Ocean and the summer northward bending of the swell pools in the Pacific and the Atlan-tic Oceans need to be further verified by other datasets.
基金Under the auspices of the National Natural Science Foundation of China (No. 40571117), the Knowledge Innovation Program of Chinese Academy of Sciences (No. KZCX3-SW-338), Research foundation of the State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing Applications, Chinese Academy of Sciences (KQ060006)
文摘Net Primary Productivity (NPP) is one of the important biophysical variables of vegetation activity, and it plays an important role in studying global carbon cycle, carbon source and sink of ecosystem, and spatial and temporal distribution of CO2. Remote sensing can provide broad view quickly, timely and multi-temporally, which makes it an attractive and powerful tool for studying ecosystem primary productivity, at scales ranging from local to global. This paper aims to use Moderate Resolution Imaging Spectroradiometer (MODIS) data to estimate and analyze spatial and temporal distribution of NPP of the northern Hebei Province in 2001 based on Carnegie-Ames-Stanford Approach (CASA) model. The spatial distribution of Absorbed Photosynthetically Active Radiation (APAR) of vegetation and light use efficiency in three geographical subregions, that is, Bashang Plateau Region, Basin Region in the northwestern Hebei Province and Yanshan Mountainous Region in the Northern Hebei Province were analyzed, and total NPP spatial distribution of the study area in 2001 was discussed. Based on 16-day MODIS Fraction of Photosynthetically Active Radiation absorbed by vegetation (FPAR) product, 16-day composite NPP dynamics were calculated using CASA model; the seasonal dynamics of vegetation NPP in three subreglons were also analyzed. Result reveals that the total NPP of the study area in 2001 was 25.1877 × 10^6gC/(m^2.a), and NPP in 2001 ranged from 2 to 608gC/(m^2-a), with an average of 337.516gC/(m^2.a). NPP of the study area in 2001 accumulated mainly from May to September (DOY 129-272), high NIP values appeared from June to August (DOY 177-204), and the maximum NPP appeared from late July to mid-August (DOY 209-224).
基金funded by the Deanship Scientific Research(DSR),King Abdulaziz University,Jeddah,under the GrantNo.KEP-PhD:21-130-1443.
文摘In this article,we highlight a new three-parameter heavy-tailed lifetime distribution that aims to extend the modeling possibilities of the Lomax distribution.It is called the extended Lomax distribution.The considered distribution naturally appears as the distribution of a transformation of a random variable following the logweighted power distribution recently introduced for percentage or proportion data analysis purposes.As a result,its cumulative distribution has the same functional basis as that of the Lomax distribution,but with a novel special logarithmic term depending on several parameters.The modulation of this logarithmic term reveals new types of asymetrical shapes,implying a modeling horizon beyond that of the Lomax distribution.In the first part,we examine several of its mathematical properties,such as the shapes of the related probability and hazard rate functions;stochastic comparisons;manageable expansions for various moments;and quantile properties.In particular,based on the quantile functions,various actuarial measures are discussed.In the second part,the distribution’s applicability is investigated with the use of themaximumlikelihood estimationmethod.The behavior of the obtained parameter estimates is validated by a simulation work.Insurance claim data are analyzed.We show that the proposed distribution outperforms eight well-known distributions,including the Lomax distribution and several extended Lomax distributions.In addition,we demonstrate that it gives preferable inferences from these competitor distributions in terms of risk measures.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.