A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two...A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two-dimensional vector reconstruction (TSR) method. The key idea is to apply the D3 approach which can extract the signal of given frequency but null out other frequency signals in temporal domain. Then the spatial vector reconstruction processing is used to estimate the angle of the spatial coherent signal source based on extract signal data. Compared with the common temporal and spatial processing approach, the TSR method has a lower computational load, higher real-time performance, robustness and angular accuracy of DOA. The proposed algorithm can be directly applied to the phased array radar of coherent pulses. Simulation results demonstrate the performance of the proposed technique.展开更多
In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristi...In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristic of the range cell under test. A ravel methodology utilizing the direct data domain approach to space-time adaptive processing ( STAP ) in airbome radar non-homogeneous environments is presented. The deterministic least squares adaptive signal processing technique operates on a "snapshot-by-snapshot" basis to dethrone the adaptive adaptive weights for nulling interferences and estimating signal of interest (SOI). Furthermore, this approach eliminates the requirement for estimating the covariance through the data of neighboring range cell, which eliminates calculating the inverse of covariance, and can be implemented to operate in real-time. Simulation results illustrate the efficiency of interference suppression in non-homogeneous environment.展开更多
A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency ...A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.展开更多
With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. Fir...With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.展开更多
Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In dat...Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.展开更多
In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to d...In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to directly identify parameters of the observer-based residual generator based on a numerically reliable data equation obtained by filtering and sampling the input and output signals.展开更多
Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented oppor...Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented opportunities for large-scale knowledge discovery from database.Data mining is an emerging area of computational intelligence that offers new theories,techniques,and tools for processing large volumes of data,such as data analysis,decision making,etc.There are many researchers working on designing efficient data mining techniques,methods,and algorithms.Unfortunately,most data mining researchers pay much attention to technique problems for developing data mining models and methods,while little to basic issues of data mining.In this paper,we will propose a new understanding for data mining,that is,domain-oriented data-driven data mining(3DM)model.Some data-driven data mining algorithms developed in our Lab are also presented to show its validity.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the...In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.展开更多
Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open sour...Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.展开更多
Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of...Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of data prepared in advance, which is often challenging in real-world applications, such as streaming data and concept drift. For this reason, incremental learning (continual learning) has attracted increasing attention from scholars. However, incremental learning is associated with the challenge of catastrophic forgetting: the performance on previous tasks drastically degrades after learning a new task. In this paper, we propose a new strategy to alleviate catastrophic forgetting when neural networks are trained in continual domains. Specifically, two components are applied: data translation based on transfer learning and knowledge distillation. The former translates a portion of new data to reconstruct the partial data distribution of the old domain. The latter uses an old model as a teacher to guide a new model. The experimental results on three datasets have shown that our work can effectively alleviate catastrophic forgetting by a combination of the two methods aforementioned.展开更多
数据驱动建模方法改变了发电机传统的建模范式,导致传统的机电暂态时域仿真方法无法直接应用于新范式下的电力系统。为此,该文提出一种基于数据-模型混合驱动的机电暂态时域仿真(data and physics driven time domain simulation,DPD-T...数据驱动建模方法改变了发电机传统的建模范式,导致传统的机电暂态时域仿真方法无法直接应用于新范式下的电力系统。为此,该文提出一种基于数据-模型混合驱动的机电暂态时域仿真(data and physics driven time domain simulation,DPD-TDS)算法。算法中发电机状态变量与节点注入电流通过数据驱动模型推理计算,并通过网络方程完成节点电压计算,两者交替求解完成仿真。算法提出一种混合驱动范式下的网络代数方程组预处理方法,用以改善仿真的收敛性;算法设计一种中央处理器单元-神经网络处理器单元(central processing unit-neural network processing unit,CPU-NPU)异构计算框架以加速仿真,CPU进行机理模型的微分代数方程求解;NPU作协处理器完成数据驱动模型的前向推理。最后在IEEE-39和Polish-2383系统中将部分或全部发电机替换为数据驱动模型进行验证,仿真结果表明,所提出的仿真算法收敛性好,计算速度快,结果准确。展开更多
文摘A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two-dimensional vector reconstruction (TSR) method. The key idea is to apply the D3 approach which can extract the signal of given frequency but null out other frequency signals in temporal domain. Then the spatial vector reconstruction processing is used to estimate the angle of the spatial coherent signal source based on extract signal data. Compared with the common temporal and spatial processing approach, the TSR method has a lower computational load, higher real-time performance, robustness and angular accuracy of DOA. The proposed algorithm can be directly applied to the phased array radar of coherent pulses. Simulation results demonstrate the performance of the proposed technique.
文摘In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristic of the range cell under test. A ravel methodology utilizing the direct data domain approach to space-time adaptive processing ( STAP ) in airbome radar non-homogeneous environments is presented. The deterministic least squares adaptive signal processing technique operates on a "snapshot-by-snapshot" basis to dethrone the adaptive adaptive weights for nulling interferences and estimating signal of interest (SOI). Furthermore, this approach eliminates the requirement for estimating the covariance through the data of neighboring range cell, which eliminates calculating the inverse of covariance, and can be implemented to operate in real-time. Simulation results illustrate the efficiency of interference suppression in non-homogeneous environment.
基金Supported by the Natural Science Foundation of Jiangsu Province (No.BK2004016).
文摘A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.
基金Supported by the National Natural Science Foun-dation of China (60173058 ,70372024)
文摘With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.
基金supported in part by the following funding agencies of China:National Natural Science Foundation under Grant 61602050 and U1534201National Key Research and Development Program of China under Grant 2016QY01W0200
文摘Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.
基金This work was supported was supported in part by the European Union under grant NeCST.
文摘In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to directly identify parameters of the observer-based residual generator based on a numerically reliable data equation obtained by filtering and sampling the input and output signals.
文摘Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented opportunities for large-scale knowledge discovery from database.Data mining is an emerging area of computational intelligence that offers new theories,techniques,and tools for processing large volumes of data,such as data analysis,decision making,etc.There are many researchers working on designing efficient data mining techniques,methods,and algorithms.Unfortunately,most data mining researchers pay much attention to technique problems for developing data mining models and methods,while little to basic issues of data mining.In this paper,we will propose a new understanding for data mining,that is,domain-oriented data-driven data mining(3DM)model.Some data-driven data mining algorithms developed in our Lab are also presented to show its validity.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
文摘In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.
文摘Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.
文摘Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of data prepared in advance, which is often challenging in real-world applications, such as streaming data and concept drift. For this reason, incremental learning (continual learning) has attracted increasing attention from scholars. However, incremental learning is associated with the challenge of catastrophic forgetting: the performance on previous tasks drastically degrades after learning a new task. In this paper, we propose a new strategy to alleviate catastrophic forgetting when neural networks are trained in continual domains. Specifically, two components are applied: data translation based on transfer learning and knowledge distillation. The former translates a portion of new data to reconstruct the partial data distribution of the old domain. The latter uses an old model as a teacher to guide a new model. The experimental results on three datasets have shown that our work can effectively alleviate catastrophic forgetting by a combination of the two methods aforementioned.