The application of data envelopment analysis (DEA) as a multiple criteria decision making (MCDM) technique has been gaining more and more attention in recent research. In the practice of applying DEA approach, the...The application of data envelopment analysis (DEA) as a multiple criteria decision making (MCDM) technique has been gaining more and more attention in recent research. In the practice of applying DEA approach, the appearance of uncertainties on input and output data of decision making unit (DMU) might make the nominal solution infeasible and lead to the efficiency scores meaningless from practical view. This paper analyzes the impact of data uncertainty on the evaluation results of DEA, and proposes several robust DEA models based on the adaptation of recently developed robust optimization approaches, which would be immune against input and output data uncertainties. The robust DEA models developed are based on input-oriented and outputoriented CCR model, respectively, when the uncertainties appear in output data and input data separately. Furthermore, the robust DEA models could deal with random symmetric uncertainty and unknown-but-bounded uncertainty, in both of which the distributions of the random data entries are permitted to be unknown. The robust DEA models are implemented in a numerical example and the efficiency scores and rankings of these models are compared. The results indicate that the robust DEA approach could be a more reliable method for efficiency evaluation and ranking in MCDM problems.展开更多
With the increasing popularity of wireless sensor network and GPS ( global positioning system), uncertain data as a new type of data brings a new challenge for the traditional data processing methods. Data broadcast...With the increasing popularity of wireless sensor network and GPS ( global positioning system), uncertain data as a new type of data brings a new challenge for the traditional data processing methods. Data broadcast is an effective means for data dissemination in mobile networks. In this paper, the def'mition of the mean uncertainty ratio of data is presented and a broadcasting scheme is proposed for uncertain data dissemination. Simulation results show that the scheme can reduce the uncertainty of the broadcasted uncertain data effectively at the cost of a minor increase in data access time, in the case of no transmission error and presence of transmission errors. As a result, lower uncertainty of data benefits the qualifies of the query results based on the data.展开更多
In this paper we construct optimal, in certain sense, estimates of values of linear functionals on solutions to two-point boundary value problems (BVPs) for systems of linear first-order ordinary differential equation...In this paper we construct optimal, in certain sense, estimates of values of linear functionals on solutions to two-point boundary value problems (BVPs) for systems of linear first-order ordinary differential equations from observations which are linear transformations of the same solutions perturbed by additive random noises. It is assumed here that right-hand sides of equations and boundary data as well as statistical characteristics of random noises in observations are not known and belong to certain given sets in corresponding functional spaces. This leads to the necessity of introducing minimax statement of an estimation problem when optimal estimates are defined as linear, with respect to observations, estimates for which the maximum of mean square error of estimation taken over the above-mentioned sets attains minimal value. Such estimates are called minimax mean square or guaranteed estimates. We establish that the minimax mean square estimates are expressed via solutions of some systems of differential equations of special type and determine estimation errors.展开更多
A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data...A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.展开更多
Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclu...Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence,this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore,a populated-cells list(PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm.An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms.展开更多
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. ...Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.展开更多
Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most...Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and e?ciency of the proposed model and algorithms.展开更多
In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, co...In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, correlations among attributes cannot be captured. In this paper, for base tuples with multiple uncertain attributes, we define attribute level annotation to annotate each attribute. Utilizing these annotations to generate lineages of result tuples can realize more precise derivation. Simultaneously,they can be used for dependency graph construction. Utilizing dependency graph, we can represent not only constraints on schemas but also correlations among attributes. Combining the dependency graph and attribute level lineage, we can correctly compute probabilities of result tuples and precisely derivate data. In experiments, comparing lineage on tuple level and attribute level, it shows that our method has advantages on derivation precision and storage cost.展开更多
There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. O...There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.展开更多
This article investigates the problem of robust H∞ controller design for sampled-data systems with time-varying norm-bounded parameter uncertainties in the state matrices. Attention is focused on the design of a caus...This article investigates the problem of robust H∞ controller design for sampled-data systems with time-varying norm-bounded parameter uncertainties in the state matrices. Attention is focused on the design of a causal sampled-data controller, which guarantees the asymptotical stability of the closed-loop system and reduces the effect of the disturbance input on the controlled output to a prescribed H∞ performance bound for all admissible uncertainties. Sufficient condition for the solvability of the problem is established in terms of linear matrix inequalities (LMIs). It is shown that the desired H∞ controller can be constructed by solving certain LMIs. An illustrative example is given to demonstrate the effectiveness of the proposed method.展开更多
A new approach is proposed for robust H2 problem of uncertain sampled-data systems. Through introducing a free variable, a new Lyapunov asymptotical stability criterion with less conservativeness is established. Based...A new approach is proposed for robust H2 problem of uncertain sampled-data systems. Through introducing a free variable, a new Lyapunov asymptotical stability criterion with less conservativeness is established. Based on this criterion, some sufficient conditions on two classes of robust H2 problems for uncertain sampled-data control systems axe presented through a set of coupled linear matrix inequalities. Finally, the less conservatism and potential of the developed results are illustrated via a numerical example.展开更多
The problem of robust controller design with covariance constraint for uncertain sampled data feedback control systems was considered in this paper. The goal of this problem is to design controllers such that the clo...The problem of robust controller design with covariance constraint for uncertain sampled data feedback control systems was considered in this paper. The goal of this problem is to design controllers such that the closed loop system meets the prespecified covariance constraint. This problem can be reduced to a controller design problem for an equivalent uncertain discrete time system. Sufficient conditions were given for the existence of the desired controllers. The analytical expression of the set of desired controllers was also presented. An illustrative example was given to show the applicability of the proposed design procedure.展开更多
This paper was concerned with the problem of robust sampled data state estimation for uncertain continuous time systems. A sampled data estimation covariance is given by taking intersample behaviour into account. T...This paper was concerned with the problem of robust sampled data state estimation for uncertain continuous time systems. A sampled data estimation covariance is given by taking intersample behaviour into account. The primary purpose of this paper is to design robust discrete time Kalman filters such that the sampled data estimation covariance is not more than a prespecified value, and therefore the error variances achieve the desired constraints. It is shown that the addressed problem can be converted into a similar problem for a fictitious discrete time system. The existence conditions and the explicit expression of desired filters were both derived. Finally, a simple example was presented to demonstrate the effectiveness of the proposed design procedure.展开更多
文摘The application of data envelopment analysis (DEA) as a multiple criteria decision making (MCDM) technique has been gaining more and more attention in recent research. In the practice of applying DEA approach, the appearance of uncertainties on input and output data of decision making unit (DMU) might make the nominal solution infeasible and lead to the efficiency scores meaningless from practical view. This paper analyzes the impact of data uncertainty on the evaluation results of DEA, and proposes several robust DEA models based on the adaptation of recently developed robust optimization approaches, which would be immune against input and output data uncertainties. The robust DEA models developed are based on input-oriented and outputoriented CCR model, respectively, when the uncertainties appear in output data and input data separately. Furthermore, the robust DEA models could deal with random symmetric uncertainty and unknown-but-bounded uncertainty, in both of which the distributions of the random data entries are permitted to be unknown. The robust DEA models are implemented in a numerical example and the efficiency scores and rankings of these models are compared. The results indicate that the robust DEA approach could be a more reliable method for efficiency evaluation and ranking in MCDM problems.
基金Initial Research Foundation of Shanghai Second Polytechnic University ( No.001943)National High Technology Research and Development Program of China(863 Program) (No.2007AA01Z309)
文摘With the increasing popularity of wireless sensor network and GPS ( global positioning system), uncertain data as a new type of data brings a new challenge for the traditional data processing methods. Data broadcast is an effective means for data dissemination in mobile networks. In this paper, the def'mition of the mean uncertainty ratio of data is presented and a broadcasting scheme is proposed for uncertain data dissemination. Simulation results show that the scheme can reduce the uncertainty of the broadcasted uncertain data effectively at the cost of a minor increase in data access time, in the case of no transmission error and presence of transmission errors. As a result, lower uncertainty of data benefits the qualifies of the query results based on the data.
文摘In this paper we construct optimal, in certain sense, estimates of values of linear functionals on solutions to two-point boundary value problems (BVPs) for systems of linear first-order ordinary differential equations from observations which are linear transformations of the same solutions perturbed by additive random noises. It is assumed here that right-hand sides of equations and boundary data as well as statistical characteristics of random noises in observations are not known and belong to certain given sets in corresponding functional spaces. This leads to the necessity of introducing minimax statement of an estimation problem when optimal estimates are defined as linear, with respect to observations, estimates for which the maximum of mean square error of estimation taken over the above-mentioned sets attains minimal value. Such estimates are called minimax mean square or guaranteed estimates. We establish that the minimax mean square estimates are expressed via solutions of some systems of differential equations of special type and determine estimation errors.
文摘A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.
基金supported by Grant-in-Aid for Scientific Research(A)(#24240015A)
文摘Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence,this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore,a populated-cells list(PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm.An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms.
基金supported by the National Natural Science Foundation of China under Grant Nos.61025007,61328202,61173029,61100024,61332006,and 61073063the National High Technology Research and Development 863 Program of China under Grant No.2012AA011004the National Basic Research 973 Program of China under Grant No.2011CB302200-G
文摘Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
基金This work is partially supported by the Hong Kong RGC Project under Grant No. N_HKUST637/13, the National Basic Research 973 Program of China under Grant No. 2014CB340303, the National Natural Science Foundation of China under Grant Nos. 61328202 and 61300031, Microsoft Research Asia Gift Grant, Google Faculty Award 2013, and Microsoft Research Asia Fellowship 2012.
文摘Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and e?ciency of the proposed model and algorithms.
基金Supported by the Key Program of National Natural Science Foundation of China(61232002)The National Natural Science Foundation of China(61202033)+2 种基金The Program for Innovative Research Team of Wuhan(2014070504020237)The Ph.D.Seed Foundation of Wuhan University(2012211020207)The Science and Technology Support Program of Hubei Province(2015BAA127)
文摘In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, correlations among attributes cannot be captured. In this paper, for base tuples with multiple uncertain attributes, we define attribute level annotation to annotate each attribute. Utilizing these annotations to generate lineages of result tuples can realize more precise derivation. Simultaneously,they can be used for dependency graph construction. Utilizing dependency graph, we can represent not only constraints on schemas but also correlations among attributes. Combining the dependency graph and attribute level lineage, we can correctly compute probabilities of result tuples and precisely derivate data. In experiments, comparing lineage on tuple level and attribute level, it shows that our method has advantages on derivation precision and storage cost.
基金Supported by the National High Technology Research and Development Program of China(863 Program 2012AA011004)the National Natural Science Foundation of China(61232002,61202033)Natural Science Foundation of Hubei Province(2011CDB448)
文摘There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
基金supported by the National Natural Science Foundation of China (60574004 60736024+1 种基金 60674043) the Key Project of Science and Technology Research of the Ministry of Education of China (708069).
文摘This article investigates the problem of robust H∞ controller design for sampled-data systems with time-varying norm-bounded parameter uncertainties in the state matrices. Attention is focused on the design of a causal sampled-data controller, which guarantees the asymptotical stability of the closed-loop system and reduces the effect of the disturbance input on the controlled output to a prescribed H∞ performance bound for all admissible uncertainties. Sufficient condition for the solvability of the problem is established in terms of linear matrix inequalities (LMIs). It is shown that the desired H∞ controller can be constructed by solving certain LMIs. An illustrative example is given to demonstrate the effectiveness of the proposed method.
文摘A new approach is proposed for robust H2 problem of uncertain sampled-data systems. Through introducing a free variable, a new Lyapunov asymptotical stability criterion with less conservativeness is established. Based on this criterion, some sufficient conditions on two classes of robust H2 problems for uncertain sampled-data control systems axe presented through a set of coupled linear matrix inequalities. Finally, the less conservatism and potential of the developed results are illustrated via a numerical example.
文摘The problem of robust controller design with covariance constraint for uncertain sampled data feedback control systems was considered in this paper. The goal of this problem is to design controllers such that the closed loop system meets the prespecified covariance constraint. This problem can be reduced to a controller design problem for an equivalent uncertain discrete time system. Sufficient conditions were given for the existence of the desired controllers. The analytical expression of the set of desired controllers was also presented. An illustrative example was given to show the applicability of the proposed design procedure.
文摘This paper was concerned with the problem of robust sampled data state estimation for uncertain continuous time systems. A sampled data estimation covariance is given by taking intersample behaviour into account. The primary purpose of this paper is to design robust discrete time Kalman filters such that the sampled data estimation covariance is not more than a prespecified value, and therefore the error variances achieve the desired constraints. It is shown that the addressed problem can be converted into a similar problem for a fictitious discrete time system. The existence conditions and the explicit expression of desired filters were both derived. Finally, a simple example was presented to demonstrate the effectiveness of the proposed design procedure.