Due to the high inherent uncertainty of renewable energy,probabilistic day-ahead wind power forecasting is crucial for modeling and controlling the uncertainty of renewable energy smart grids in smart cities.However,t...Due to the high inherent uncertainty of renewable energy,probabilistic day-ahead wind power forecasting is crucial for modeling and controlling the uncertainty of renewable energy smart grids in smart cities.However,the accuracy and reliability of high-resolution day-ahead wind power forecasting are constrained by unreliable local weather prediction and incomplete power generation data.This article proposes a physics-informed artificial intelligence(AI)surrogates method to augment the incomplete dataset and quantify its uncertainty to improve wind power forecasting performance.The incomplete dataset,built with numerical weather prediction data,historical wind power generation,and weather factors data,is augmented based on generative adversarial networks.After augmentation,the enriched data is then fed into a multiple AI surrogates model constructed by two extreme learning machine networks to train the forecasting model for wind power.Therefore,the forecasting models’accuracy and generalization ability are improved by mining the implicit physics information from the incomplete dataset.An incomplete dataset gathered from a wind farm in North China,containing only 15 days of weather and wind power generation data withmissing points caused by occasional shutdowns,is utilized to verify the proposed method’s performance.Compared with other probabilistic forecastingmethods,the proposed method shows better accuracy and probabilistic performance on the same incomplete dataset,which highlights its potential for more flexible and sensitive maintenance of smart grids in smart cities.展开更多
With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow e...With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance.展开更多
Data obtained from accelerated life testing (ALT) when there are two or more failure modes, which is commonly referred to as competing failure modes, are often incomplete. The incompleteness is mainly due to censori...Data obtained from accelerated life testing (ALT) when there are two or more failure modes, which is commonly referred to as competing failure modes, are often incomplete. The incompleteness is mainly due to censoring, as well as masking which might be the case that the failure time is observed, but its corresponding failure mode is not identified. Because the identification of the failure mode may be expensive, or very difficult to investigate due to lack of appropriate diagnostics. A method is proposed for analyzing incomplete data of constant stress ALT with competing failure modes. It is assumed that failure modes have s-independent latent lifetimes and the log lifetime of each failure mode can be written as a linear function of stress. The parameters of the model are estimated by using the expectation maximum (EM) algorithm with incomplete data. Simulation studies are performed to check'model validity and investigate the properties of estimates. For further validation, the method is also illustrated by an example, which shows the process of analyze incomplete data from ALT of some insulation system. Because of considering the incompleteness of data in modeling and making use of the EM algorithm in estimating, the method becomes more flexible in ALT analysis.展开更多
Energy consumption prediction of a CNC machining process is important for energy efficiency optimization strategies.To improve the generalization abilities,more and more parameters are acquired for energy prediction m...Energy consumption prediction of a CNC machining process is important for energy efficiency optimization strategies.To improve the generalization abilities,more and more parameters are acquired for energy prediction modeling.While the data collected from workshops may be incomplete because of misoperation,unstable network connections,and frequent transfers,etc.This work proposes a framework for energy modeling based on incomplete data to address this issue.First,some necessary preliminary operations are used for incomplete data sets.Then,missing values are estimated to generate a new complete data set based on generative adversarial imputation nets(GAIN).Next,the gene expression programming(GEP)algorithm is utilized to train the energy model based on the generated data sets.Finally,we test the predictive accuracy of the obtained model.Computational experiments are designed to investigate the performance of the proposed framework with different rates of missing data.Experimental results demonstrate that even when the missing data rate increases to 30%,the proposed framework can still make efficient predictions,with the corresponding RMSE and MAE 0.903 k J and 0.739 k J,respectively.展开更多
In modern industrial processes, timely detection and diagnosis of process abnormalities are critical for monitoring process operations. Various fault detection and diagnosis(FDD) methods have been proposed and impleme...In modern industrial processes, timely detection and diagnosis of process abnormalities are critical for monitoring process operations. Various fault detection and diagnosis(FDD) methods have been proposed and implemented, the performance of which, however, could be drastically influenced by the common presence of incomplete or missing data in real industrial scenarios. This paper presents a new FDD approach based on an incomplete data imputation technique for process fault recognition. It employs the modified stacked autoencoder,a deep learning structure, in the phase of incomplete data treatment, and classifies data representations rather than the imputed complete data in the phase of fault identification. A benchmark process, the Tennessee Eastman process, is employed to illustrate the effectiveness and applicability of the proposed method.展开更多
Due to the simplicity and flexibility of the power law process,it is widely used to model the failures of repairable systems.Although statistical inference on the parameters of the power law process has been well deve...Due to the simplicity and flexibility of the power law process,it is widely used to model the failures of repairable systems.Although statistical inference on the parameters of the power law process has been well developed,numerous studies largely depend on complete failure data.A few methods on incomplete data are reported to process such data,but they are limited to their specific cases,especially to that where missing data occur at the early stage of the failures.No framework to handle generic scenarios is available.To overcome this problem,from the point of view of order statistics,the statistical inference of the power law process with incomplete data is established in this paper.The theoretical derivation is carried out and the case studies demonstrate and verify the proposed method.Order statistics offer an alternative to the statistical inference of the power law process with incomplete data as they can reformulate current studies on the left censored failure data and interval censored data in a unified framework.The results show that the proposed method has more flexibility and more applicability.展开更多
Multiple kernel clustering is an unsupervised data analysis method that has been used in various scenarios where data is easy to be collected but hard to be labeled.However,multiple kernel clustering for incomplete da...Multiple kernel clustering is an unsupervised data analysis method that has been used in various scenarios where data is easy to be collected but hard to be labeled.However,multiple kernel clustering for incomplete data is a critical yet challenging task.Although the existing absent multiple kernel clustering methods have achieved remarkable performance on this task,they may fail when data has a high value-missing rate,and they may easily fall into a local optimum.To address these problems,in this paper,we propose an absent multiple kernel clustering(AMKC)method on incomplete data.The AMKC method rst clusters the initialized incomplete data.Then,it constructs a new multiple-kernel-based data space,referred to as K-space,from multiple sources to learn kernel combination coefcients.Finally,it seamlessly integrates an incomplete-kernel-imputation objective,a multiple-kernel-learning objective,and a kernel-clustering objective in order to achieve absent multiple kernel clustering.The three stages in this process are carried out simultaneously until the convergence condition is met.Experiments on six datasets with various characteristics demonstrate that the kernel imputation and clustering performance of the proposed method is signicantly better than state-of-the-art competitors.Meanwhile,the proposed method gains fast convergence speed.展开更多
Data with missing values,or incomplete information,brings some challenges to the development of classification,as the incompleteness may significantly affect the performance of classifiers.In this paper,we handle miss...Data with missing values,or incomplete information,brings some challenges to the development of classification,as the incompleteness may significantly affect the performance of classifiers.In this paper,we handle missing values in both training and test sets with uncertainty and imprecision reasoning by proposing a new belief combination of classifier(BCC)method based on the evidence theory.The proposed BCC method aims to improve the classification performance of incomplete data by characterizing the uncertainty and imprecision brought by incompleteness.In BCC,different attributes are regarded as independent sources,and the collection of each attribute is considered as a subset.Then,multiple classifiers are trained with each subset independently and allow each observed attribute to provide a sub-classification result for the query pattern.Finally,these sub-classification results with different weights(discounting factors)are used to provide supplementary information to jointly determine the final classes of query patterns.The weights consist of two aspects:global and local.The global weight calculated by an optimization function is employed to represent the reliability of each classifier,and the local weight obtained by mining attribute distribution characteristics is used to quantify the importance of observed attributes to the pattern classification.Abundant comparative experiments including seven methods on twelve datasets are executed,demonstrating the out-performance of BCC over all baseline methods in terms of accuracy,precision,recall,F1 measure,with pertinent computational costs.展开更多
A generalized flexibility–based objective function utilized for structure damage identification is constructed for solving the constrained nonlinear least squares optimized problem. To begin with, the generalized fle...A generalized flexibility–based objective function utilized for structure damage identification is constructed for solving the constrained nonlinear least squares optimized problem. To begin with, the generalized flexibility matrix (GFM) proposed to solve the damage identification problem is recalled and a modal expansion method is introduced. Next, the objective function for iterative optimization process based on the GFM is formulated, and the Trust-Region algorithm is utilized to obtain the solution of the optimization problem for multiple damage cases. And then for computing the objective function gradient, the sensitivity analysis regarding design variables is derived. In addition, due to the spatial incompleteness, the influence of stiffness reduction and incomplete modal measurement data is discussed by means of two numerical examples with several damage cases. Finally, based on the computational results, it is evident that the presented approach provides good validity and reliability for the large and complicated engineering structures.展开更多
The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation meth...The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A few researches allow them to participate in data anonymization but introduce extra considerable information loss.To balance the utility and privacy preservation of incomplete data streams,we present a utility-enhanced approach for Incomplete Data strEam Anonymization(IDEA).In this approach,a slide-window-based processing framework is introduced to anonymize data streams continuously,in which each tuple can be output with clustering or anonymized clusters.We consider the dimensions of attribute and tuple as the similarity measurement,which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss.To avoid the missing value pollution,we propose a generalization method that is based on maybe match for generalizing incomplete data.The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.展开更多
Incomplete data accompanies our life processes and covers almost all fields of scientific studies, as a result of delivery failure, no power of battery, accidental loss, etc. However, how to model, index, and query in...Incomplete data accompanies our life processes and covers almost all fields of scientific studies, as a result of delivery failure, no power of battery, accidental loss, etc. However, how to model, index, and query incomplete data in- curs big challenges. For example, the queries struggling with incomplete data usually have dissatisfying query results due to the improper incompleteness handling methods. In this pa- per, we systematically review the management of incomplete data, including modelling, indexing, querying, and handling methods in terms of incomplete data. We also overview sev- eral application scenarios of incomplete data, and summa- rize the existing systems related to incomplete data. It is our hope that this survey could provide insights to the database community on how incomplete data is managed, and inspire database researchers to develop more advanced processing techniques and tools to cope with the issues resulting from incomplete data in the real world.展开更多
Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missi...Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.展开更多
Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval condi...Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets.As an extension of skyline query,the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects.In addition,with the continuous promotion of Bigdata applications,the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure,no power of battery,accidental loss,so that the data might be incomplete with missing values in some attributes.Obviously,the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared.Meanwhile,the existing algorithms are unsuitable for directly used to the incomplete big data.Based on the above situations,this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment.First,we propose an index structure over incomplete data,named incomplete data index based on dominate hierarchical tree(ID-DHT).Applying the bucket strategy,the incomplete data is divided into different buckets according to the dimensions of missing attributes.Second,we also put forward query algorithm for incomplete data in MapReduce environment,named MapReduce incomplete data based on dominant hierarchical tree algorithm(MR-ID-DHTA).The data in the bucket is allocated to the subspace according to the dominant condition by Map function.Reduce function controls the data according to the key value and returns the k-dominant skyline query result.The effective experiments demonstrate the validity and usability of our index structure and the algorithm.展开更多
For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-d...For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.展开更多
The accurate mathematical models for complicated structures are very difficult to construct.The work presented here provides an identification method for estimating the mass.damping,and stiffness matrices of linear dy...The accurate mathematical models for complicated structures are very difficult to construct.The work presented here provides an identification method for estimating the mass.damping,and stiffness matrices of linear dynamical systems from incomplete experimental data.The mass,stiffness and damping matrices are assumed to be real,symmetric,and positive definite The partial set of experimental complex eigenvalues and corresponding eigenvectors are given.In the proposed method the least squares algorithm is combined with the iteration technique to determine systems identified matrices and corresponding design parameters.Seeveral illustative examples,are presented to demonstrate the reliability of the proposed method .It is emphasized that the mass,damping and stiffness matrices can be identified simultaneously.展开更多
The paper introduces a novel approach for detecting structural damage in full-scale structures using surrogate models generated from incomplete modal data and deep neural networks(DNNs).A significant challenge in this...The paper introduces a novel approach for detecting structural damage in full-scale structures using surrogate models generated from incomplete modal data and deep neural networks(DNNs).A significant challenge in this field is the limited availability of measurement data for full-scale structures,which is addressed in this paper by generating data sets using a reduced finite element(FE)model constructed by SAP2000 software and the MATLAB programming loop.The surrogate models are trained using response data obtained from the monitored structure through a limited number of measurement devices.The proposed approach involves training a single surrogate model that can quickly predict the location and severity of damage for all potential scenarios.To achieve the most generalized surrogate model,the study explores different types of layers and hyperparameters of the training algorithm and employs state-of-the-art techniques to avoid overfitting and to accelerate the training process.The approach’s effectiveness,efficiency,and applicability are demonstrated by two numerical examples.The study also verifies the robustness of the proposed approach on data sets with sparse and noisy measured data.Overall,the proposed approach is a promising alternative to traditional approaches that rely on FE model updating and optimization algorithms,which can be computationally intensive.This approach also shows potential for broader applications in structural damage detection.展开更多
Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tab...Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data,and they fail to achieve the best balance between accuracy and eficiency.In this paper,we present a novel visual analysis approach for data imputation.We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables.Then,we perform the initial imputation of incomplete data using correlated data entries from other tables.Additionally,we develop a visual analysis system to refine data imputation candidates.Our interactive system combines the multi-party data imputation approach with expert knowledge,allowing for a better understanding of the relational structure of the data.This significantly enhances the accuracy and eficiency of data imputation,thereby enhancing the quality of data governance and the intrinsic value of data assets.Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using theirdomain knowledge.展开更多
This paper presents an extension of the factor analysis model based on the normal mean-variance mixture of the Birnbaum-Saunders in the presence of nonresponses and missing data.This model can be used as a powerful to...This paper presents an extension of the factor analysis model based on the normal mean-variance mixture of the Birnbaum-Saunders in the presence of nonresponses and missing data.This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises.Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling.We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism.The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.展开更多
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and...This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.展开更多
In this study,the authors proposed upper tolerance limits for the gamma mixture distribution based on generalized fiducial inference,and an MCMC simulation is performed to sample from the generalized fiducial distribu...In this study,the authors proposed upper tolerance limits for the gamma mixture distribution based on generalized fiducial inference,and an MCMC simulation is performed to sample from the generalized fiducial distributions.The simulation results and a real hydrological data example show that the proposed tolerance limits are more efficient.展开更多
基金funded by the National Natural Science Foundation of China under Grant 62273022.
文摘Due to the high inherent uncertainty of renewable energy,probabilistic day-ahead wind power forecasting is crucial for modeling and controlling the uncertainty of renewable energy smart grids in smart cities.However,the accuracy and reliability of high-resolution day-ahead wind power forecasting are constrained by unreliable local weather prediction and incomplete power generation data.This article proposes a physics-informed artificial intelligence(AI)surrogates method to augment the incomplete dataset and quantify its uncertainty to improve wind power forecasting performance.The incomplete dataset,built with numerical weather prediction data,historical wind power generation,and weather factors data,is augmented based on generative adversarial networks.After augmentation,the enriched data is then fed into a multiple AI surrogates model constructed by two extreme learning machine networks to train the forecasting model for wind power.Therefore,the forecasting models’accuracy and generalization ability are improved by mining the implicit physics information from the incomplete dataset.An incomplete dataset gathered from a wind farm in North China,containing only 15 days of weather and wind power generation data withmissing points caused by occasional shutdowns,is utilized to verify the proposed method’s performance.Compared with other probabilistic forecastingmethods,the proposed method shows better accuracy and probabilistic performance on the same incomplete dataset,which highlights its potential for more flexible and sensitive maintenance of smart grids in smart cities.
文摘With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance.
基金supported by Sustentation Program of National Ministries and Commissions of China (Grant No. 203020102)
文摘Data obtained from accelerated life testing (ALT) when there are two or more failure modes, which is commonly referred to as competing failure modes, are often incomplete. The incompleteness is mainly due to censoring, as well as masking which might be the case that the failure time is observed, but its corresponding failure mode is not identified. Because the identification of the failure mode may be expensive, or very difficult to investigate due to lack of appropriate diagnostics. A method is proposed for analyzing incomplete data of constant stress ALT with competing failure modes. It is assumed that failure modes have s-independent latent lifetimes and the log lifetime of each failure mode can be written as a linear function of stress. The parameters of the model are estimated by using the expectation maximum (EM) algorithm with incomplete data. Simulation studies are performed to check'model validity and investigate the properties of estimates. For further validation, the method is also illustrated by an example, which shows the process of analyze incomplete data from ALT of some insulation system. Because of considering the incompleteness of data in modeling and making use of the EM algorithm in estimating, the method becomes more flexible in ALT analysis.
基金supported in part by the National Natural Science Foundation of China(51975075)Chongqing Technology Innovation and Application Program(cstc2018jszx-cyzd X0183)。
文摘Energy consumption prediction of a CNC machining process is important for energy efficiency optimization strategies.To improve the generalization abilities,more and more parameters are acquired for energy prediction modeling.While the data collected from workshops may be incomplete because of misoperation,unstable network connections,and frequent transfers,etc.This work proposes a framework for energy modeling based on incomplete data to address this issue.First,some necessary preliminary operations are used for incomplete data sets.Then,missing values are estimated to generate a new complete data set based on generative adversarial imputation nets(GAIN).Next,the gene expression programming(GEP)algorithm is utilized to train the energy model based on the generated data sets.Finally,we test the predictive accuracy of the obtained model.Computational experiments are designed to investigate the performance of the proposed framework with different rates of missing data.Experimental results demonstrate that even when the missing data rate increases to 30%,the proposed framework can still make efficient predictions,with the corresponding RMSE and MAE 0.903 k J and 0.739 k J,respectively.
基金supported by the National Natural Science Foundation of China(61433001)Tsinghua University Initiative Scientific Research Program。
文摘In modern industrial processes, timely detection and diagnosis of process abnormalities are critical for monitoring process operations. Various fault detection and diagnosis(FDD) methods have been proposed and implemented, the performance of which, however, could be drastically influenced by the common presence of incomplete or missing data in real industrial scenarios. This paper presents a new FDD approach based on an incomplete data imputation technique for process fault recognition. It employs the modified stacked autoencoder,a deep learning structure, in the phase of incomplete data treatment, and classifies data representations rather than the imputed complete data in the phase of fault identification. A benchmark process, the Tennessee Eastman process, is employed to illustrate the effectiveness and applicability of the proposed method.
基金supported by the National Natural Science Foundation of China(51775090)。
文摘Due to the simplicity and flexibility of the power law process,it is widely used to model the failures of repairable systems.Although statistical inference on the parameters of the power law process has been well developed,numerous studies largely depend on complete failure data.A few methods on incomplete data are reported to process such data,but they are limited to their specific cases,especially to that where missing data occur at the early stage of the failures.No framework to handle generic scenarios is available.To overcome this problem,from the point of view of order statistics,the statistical inference of the power law process with incomplete data is established in this paper.The theoretical derivation is carried out and the case studies demonstrate and verify the proposed method.Order statistics offer an alternative to the statistical inference of the power law process with incomplete data as they can reformulate current studies on the left censored failure data and interval censored data in a unified framework.The results show that the proposed method has more flexibility and more applicability.
基金funded by National Natural Science Foundation of China under Grant Nos.61972057 and U1836208Hunan Provincial Natural Science Foundation of China under Grant No.2019JJ50655+3 种基金Scientic Research Foundation of Hunan Provincial Education Department of China under Grant No.18B160Open Fund of Hunan Key Laboratory of Smart Roadway and Cooperative Vehicle Infrastructure Systems(Changsha University of Science and Technology)under Grant No.kfj180402the“Double First-class”International Cooperation and Development Scientic Research Project of Changsha University of Science and Technology under Grant No.2018IC25the Researchers Supporting Project No.(RSP-2020/102)King Saud University,Riyadh,Saudi Arabia.
文摘Multiple kernel clustering is an unsupervised data analysis method that has been used in various scenarios where data is easy to be collected but hard to be labeled.However,multiple kernel clustering for incomplete data is a critical yet challenging task.Although the existing absent multiple kernel clustering methods have achieved remarkable performance on this task,they may fail when data has a high value-missing rate,and they may easily fall into a local optimum.To address these problems,in this paper,we propose an absent multiple kernel clustering(AMKC)method on incomplete data.The AMKC method rst clusters the initialized incomplete data.Then,it constructs a new multiple-kernel-based data space,referred to as K-space,from multiple sources to learn kernel combination coefcients.Finally,it seamlessly integrates an incomplete-kernel-imputation objective,a multiple-kernel-learning objective,and a kernel-clustering objective in order to achieve absent multiple kernel clustering.The three stages in this process are carried out simultaneously until the convergence condition is met.Experiments on six datasets with various characteristics demonstrate that the kernel imputation and clustering performance of the proposed method is signicantly better than state-of-the-art competitors.Meanwhile,the proposed method gains fast convergence speed.
基金supported in part by the Center-initiated Research Project and Research Initiation Project of Zhejiang Laboratory(113012-AL2201,113012-PI2103)the National Natural Science Foundation of China(61300167,61976120)+2 种基金the Natural Science Foundation of Jiangsu Province(BK20191445)the Natural Science Key Foundation of Jiangsu Education Department(21KJA510004)Qing Lan Project of Jiangsu Province。
文摘Data with missing values,or incomplete information,brings some challenges to the development of classification,as the incompleteness may significantly affect the performance of classifiers.In this paper,we handle missing values in both training and test sets with uncertainty and imprecision reasoning by proposing a new belief combination of classifier(BCC)method based on the evidence theory.The proposed BCC method aims to improve the classification performance of incomplete data by characterizing the uncertainty and imprecision brought by incompleteness.In BCC,different attributes are regarded as independent sources,and the collection of each attribute is considered as a subset.Then,multiple classifiers are trained with each subset independently and allow each observed attribute to provide a sub-classification result for the query pattern.Finally,these sub-classification results with different weights(discounting factors)are used to provide supplementary information to jointly determine the final classes of query patterns.The weights consist of two aspects:global and local.The global weight calculated by an optimization function is employed to represent the reliability of each classifier,and the local weight obtained by mining attribute distribution characteristics is used to quantify the importance of observed attributes to the pattern classification.Abundant comparative experiments including seven methods on twelve datasets are executed,demonstrating the out-performance of BCC over all baseline methods in terms of accuracy,precision,recall,F1 measure,with pertinent computational costs.
文摘A generalized flexibility–based objective function utilized for structure damage identification is constructed for solving the constrained nonlinear least squares optimized problem. To begin with, the generalized flexibility matrix (GFM) proposed to solve the damage identification problem is recalled and a modal expansion method is introduced. Next, the objective function for iterative optimization process based on the GFM is formulated, and the Trust-Region algorithm is utilized to obtain the solution of the optimization problem for multiple damage cases. And then for computing the objective function gradient, the sensitivity analysis regarding design variables is derived. In addition, due to the spatial incompleteness, the influence of stiffness reduction and incomplete modal measurement data is discussed by means of two numerical examples with several damage cases. Finally, based on the computational results, it is evident that the presented approach provides good validity and reliability for the large and complicated engineering structures.
基金supported by the National Natural Science Foundation of China (Nos. U19A2081 and 61802270)the Fundamental Research Funds for the Central Universities (No. 2020SCUNG129)。
文摘The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A few researches allow them to participate in data anonymization but introduce extra considerable information loss.To balance the utility and privacy preservation of incomplete data streams,we present a utility-enhanced approach for Incomplete Data strEam Anonymization(IDEA).In this approach,a slide-window-based processing framework is introduced to anonymize data streams continuously,in which each tuple can be output with clustering or anonymized clusters.We consider the dimensions of attribute and tuple as the similarity measurement,which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss.To avoid the missing value pollution,we propose a generalization method that is based on maybe match for generalizing incomplete data.The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.
文摘Incomplete data accompanies our life processes and covers almost all fields of scientific studies, as a result of delivery failure, no power of battery, accidental loss, etc. However, how to model, index, and query incomplete data in- curs big challenges. For example, the queries struggling with incomplete data usually have dissatisfying query results due to the improper incompleteness handling methods. In this pa- per, we systematically review the management of incomplete data, including modelling, indexing, querying, and handling methods in terms of incomplete data. We also overview sev- eral application scenarios of incomplete data, and summa- rize the existing systems related to incomplete data. It is our hope that this survey could provide insights to the database community on how incomplete data is managed, and inspire database researchers to develop more advanced processing techniques and tools to cope with the issues resulting from incomplete data in the real world.
基金supported by the National Natural Science Foundation of China(Nos.U1866602 and 71773025)the National Key Research and Development Program of China(No.2020YFB1006104)
文摘Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems,we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.62072220,61802160,61502215)China Postdoctoral Science Foundation Funded Project(2020M672134)+1 种基金Science Research Fund of Liaoning Province Education Department(LJC201913)Doctor Research Start-up Fund of Liaoning Province(20180540106).
文摘Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets.As an extension of skyline query,the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects.In addition,with the continuous promotion of Bigdata applications,the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure,no power of battery,accidental loss,so that the data might be incomplete with missing values in some attributes.Obviously,the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared.Meanwhile,the existing algorithms are unsuitable for directly used to the incomplete big data.Based on the above situations,this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment.First,we propose an index structure over incomplete data,named incomplete data index based on dominate hierarchical tree(ID-DHT).Applying the bucket strategy,the incomplete data is divided into different buckets according to the dimensions of missing attributes.Second,we also put forward query algorithm for incomplete data in MapReduce environment,named MapReduce incomplete data based on dominant hierarchical tree algorithm(MR-ID-DHTA).The data in the bucket is allocated to the subspace according to the dominant condition by Map function.Reduce function controls the data according to the key value and returns the k-dominant skyline query result.The effective experiments demonstrate the validity and usability of our index structure and the algorithm.
基金supported by the National Natural Science Foundation of China(61202473)the Fundamental Research Funds for Central Universities(JUSRP111A49)+1 种基金"111 Project"(B12018)the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.
文摘The accurate mathematical models for complicated structures are very difficult to construct.The work presented here provides an identification method for estimating the mass.damping,and stiffness matrices of linear dynamical systems from incomplete experimental data.The mass,stiffness and damping matrices are assumed to be real,symmetric,and positive definite The partial set of experimental complex eigenvalues and corresponding eigenvectors are given.In the proposed method the least squares algorithm is combined with the iteration technique to determine systems identified matrices and corresponding design parameters.Seeveral illustative examples,are presented to demonstrate the reliability of the proposed method .It is emphasized that the mass,damping and stiffness matrices can be identified simultaneously.
基金This study was supported by Bualuang ASEAN Chair Professor Fund.
文摘The paper introduces a novel approach for detecting structural damage in full-scale structures using surrogate models generated from incomplete modal data and deep neural networks(DNNs).A significant challenge in this field is the limited availability of measurement data for full-scale structures,which is addressed in this paper by generating data sets using a reduced finite element(FE)model constructed by SAP2000 software and the MATLAB programming loop.The surrogate models are trained using response data obtained from the monitored structure through a limited number of measurement devices.The proposed approach involves training a single surrogate model that can quickly predict the location and severity of damage for all potential scenarios.To achieve the most generalized surrogate model,the study explores different types of layers and hyperparameters of the training algorithm and employs state-of-the-art techniques to avoid overfitting and to accelerate the training process.The approach’s effectiveness,efficiency,and applicability are demonstrated by two numerical examples.The study also verifies the robustness of the proposed approach on data sets with sparse and noisy measured data.Overall,the proposed approach is a promising alternative to traditional approaches that rely on FE model updating and optimization algorithms,which can be computationally intensive.This approach also shows potential for broader applications in structural damage detection.
基金Project supported by the Key R&D"Pioneer"Tackling Plan Program of Zhejiang Province,China(No.2023C01119)the"Ten Thousand Talents Plan"Science and Technology Innovation Leading Talent Program of Zhejiang Province,China(No.2022R52044)+1 种基金the Major Standardization Pilot Projects for the Digital Economy(Digital Trade Sector)of Zhejiang Province,China(No.SJ-Bz/2023053)the National Natural Science Foundationof China(No.62132017)。
文摘Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data,and they fail to achieve the best balance between accuracy and eficiency.In this paper,we present a novel visual analysis approach for data imputation.We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables.Then,we perform the initial imputation of incomplete data using correlated data entries from other tables.Additionally,we develop a visual analysis system to refine data imputation candidates.Our interactive system combines the multi-party data imputation approach with expert knowledge,allowing for a better understanding of the relational structure of the data.This significantly enhances the accuracy and eficiency of data imputation,thereby enhancing the quality of data governance and the intrinsic value of data assets.Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using theirdomain knowledge.
基金This work was based on research supported by the National Research Foundation,South Africa(SRUG190308422768 Grant No.120839 and IFR170227223754 Grant No.109214)the South African NRF SARChI Research Chair in Computational and Methodological Statistics(UID:71199)The research of the corresponding author is supported by a grant from Ferdowsi University of Mashhad(N.2/54034).
文摘This paper presents an extension of the factor analysis model based on the normal mean-variance mixture of the Birnbaum-Saunders in the presence of nonresponses and missing data.This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises.Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling.We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism.The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.
文摘This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
文摘In this study,the authors proposed upper tolerance limits for the gamma mixture distribution based on generalized fiducial inference,and an MCMC simulation is performed to sample from the generalized fiducial distributions.The simulation results and a real hydrological data example show that the proposed tolerance limits are more efficient.