Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity ...Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity numerical simulation data.This presents a significant challenge because the sole source of authentic wellbore production data for training is sparse.In response to this challenge,this work introduces a novel architecture called physics-informed neural network based on domain decomposition(PINN-DD),aiming to effectively utilize the sparse production data of wells for reservoir simulation with large-scale systems.To harness the capabilities of physics-informed neural networks(PINNs)in handling small-scale spatial-temporal domain while addressing the challenges of large-scale systems with sparse labeled data,the computational domain is divided into two distinct sub-domains:the well-containing and the well-free sub-domain.Moreover,the two sub-domains and the interface are rigorously constrained by the governing equations,data matching,and boundary conditions.The accuracy of the proposed method is evaluated on two problems,and its performance is compared against state-of-the-art PINNs through numerical analysis as a benchmark.The results demonstrate the superiority of PINN-DD in handling large-scale reservoir simulation with limited data and show its potential to outperform conventional PINNs in such scenarios.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. Fir...With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.展开更多
A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two...A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two-dimensional vector reconstruction (TSR) method. The key idea is to apply the D3 approach which can extract the signal of given frequency but null out other frequency signals in temporal domain. Then the spatial vector reconstruction processing is used to estimate the angle of the spatial coherent signal source based on extract signal data. Compared with the common temporal and spatial processing approach, the TSR method has a lower computational load, higher real-time performance, robustness and angular accuracy of DOA. The proposed algorithm can be directly applied to the phased array radar of coherent pulses. Simulation results demonstrate the performance of the proposed technique.展开更多
In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristi...In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristic of the range cell under test. A ravel methodology utilizing the direct data domain approach to space-time adaptive processing ( STAP ) in airbome radar non-homogeneous environments is presented. The deterministic least squares adaptive signal processing technique operates on a "snapshot-by-snapshot" basis to dethrone the adaptive adaptive weights for nulling interferences and estimating signal of interest (SOI). Furthermore, this approach eliminates the requirement for estimating the covariance through the data of neighboring range cell, which eliminates calculating the inverse of covariance, and can be implemented to operate in real-time. Simulation results illustrate the efficiency of interference suppression in non-homogeneous environment.展开更多
The majority of big data analytics applied to transportation datasets suffer from being too domain-specific,that is,they draw conclusions for a dataset based on analytics on the same dataset.This makes models trained ...The majority of big data analytics applied to transportation datasets suffer from being too domain-specific,that is,they draw conclusions for a dataset based on analytics on the same dataset.This makes models trained from one domain(e.g.taxi data)applies badly to a different domain(e.g.Uber data).To achieve accurate analyses on a new domain,substantial amounts of data must be available,which limits practical applications.To remedy this,we propose to use semi-supervised and active learning of big data to accomplish the domain adaptation task:Selectively choosing a small amount of datapoints from a new domain while achieving comparable performances to using all the datapoints.We choose the New York City(NYC)transportation data of taxi and Uber as our dataset,simulating different domains with 90%as the source data domain for training and the remaining 10%as the target data domain for evaluation.We propose semi-supervised and active learning strategies and apply it to the source domain for selecting datapoints.Experimental results show that our adaptation achieves a comparable performance of using all datapoints while using only a fraction of them,substantially reducing the amount of data required.Our approach has two major advantages:It can make accurate analytics and predictions when big datasets are not available,and even if big datasets are available,our approach chooses the most informative datapoints out of the dataset,making the process much more efficient without having to process huge amounts of data.展开更多
A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency ...A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.展开更多
With the widespread application and fast development of gas and oil pipeline network in China, the pipeline inspection technology has been used more extensively. The magnetic flux leakage (MFL) method has establishe...With the widespread application and fast development of gas and oil pipeline network in China, the pipeline inspection technology has been used more extensively. The magnetic flux leakage (MFL) method has established itself as the most widely used in-line inspection technique for the evaluation of gas and oil pipelines. The MFL data obtained from seamless pipeline inspection is usually contaminated by the seamless pipe noise (SPN). SPN can in some cases completely mask MFL signals from certain type of defects, and therefore considerably reduces the detectability of the defect signals. In this paper, a new de-noising algorithm called wavelet domain adaptive filtering is proposed for removing the SPN contained in the MFL data. The new algorithm results from combining the wavelet transform with the adaptive filtering technique. Results from application of the proposed algorithm to the MFL data from field tests show that the proposed algorithm has good performance and considerably improves the detectability of the defect signals in the MFL data.展开更多
Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In dat...Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.展开更多
In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to d...In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to directly identify parameters of the observer-based residual generator based on a numerically reliable data equation obtained by filtering and sampling the input and output signals.展开更多
In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the...In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.展开更多
This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names. Data cleaning is applied in order to increase the effectiveness of decision making support. ...This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names. Data cleaning is applied in order to increase the effectiveness of decision making support. Data cleaning is ap- plied in warehouses for detection and deletion of errors, discrepancy in data in order to improve their quality. For this purpose, fuzzy record comparison algorithms are for clearing of registration data of domain names reviewed in this work. Also, identification method of domain names registration data for data warehouse formation is proposed. Deci- sion making algorithms for identification of registration data are implemented in DRRacket and Python.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
The Pan-African/Brasiliano orogenic belts are a part of numerous Neoproterozoic orogenic belts that belong to the long-lived orgenic cycle that distancing phase started at the Tonian around 1.0 Ga. The Tonian magmatis...The Pan-African/Brasiliano orogenic belts are a part of numerous Neoproterozoic orogenic belts that belong to the long-lived orgenic cycle that distancing phase started at the Tonian around 1.0 Ga. The Tonian magmatism fairly documented in the Neoproterozoic belts of Borborema Province (NE-Brazil), seemed so far inexistent in the Central African Orogenic Belt (CAOB) although these two belts <span style="font-family:Verdana;">were</span><span style="font-family:Verdana;"> geologically correlated. Through the Lu-Hf geochronological analysis on zircon of tonalite, the present work, coupled with the previous data, suggests the existence of a Tonian age magmatism in the Central Cameroon Domain of the CAOB although the latter is much reworked. The Nguesseck tonalite outcrops in the northern part of the Mbé</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">Sassa-Mbersi region, in the northern edge of Central Cameroon Domain of the CAOB and in the Tcholliré</span><span style="font-family:Verdana;">-</span><span style="font-family:;" "=""><span style="font-family:Verdana;">Banyo shear zone (TBSZ). The Lu-Hf data obtain on the zircon grains of this tonalite reveal juvenile Hf</span><sub><span style="font-family:Verdana;">TDM</span></sub><span style="font-family:Verdana;"> age of ca. 1.0 Ga. This age, combined to the previous geochemical data suggest</span></span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> that the protholites of this tonalite would have extracted from the source during the distancing phase (rifting and oceanization) of the Pan-african/Brasiliano orogeny at the early Neoproterozoic.</span>展开更多
Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented oppor...Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented opportunities for large-scale knowledge discovery from database.Data mining is an emerging area of computational intelligence that offers new theories,techniques,and tools for processing large volumes of data,such as data analysis,decision making,etc.There are many researchers working on designing efficient data mining techniques,methods,and algorithms.Unfortunately,most data mining researchers pay much attention to technique problems for developing data mining models and methods,while little to basic issues of data mining.In this paper,we will propose a new understanding for data mining,that is,domain-oriented data-driven data mining(3DM)model.Some data-driven data mining algorithms developed in our Lab are also presented to show its validity.展开更多
Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open sour...Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.展开更多
Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of...Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of data prepared in advance, which is often challenging in real-world applications, such as streaming data and concept drift. For this reason, incremental learning (continual learning) has attracted increasing attention from scholars. However, incremental learning is associated with the challenge of catastrophic forgetting: the performance on previous tasks drastically degrades after learning a new task. In this paper, we propose a new strategy to alleviate catastrophic forgetting when neural networks are trained in continual domains. Specifically, two components are applied: data translation based on transfer learning and knowledge distillation. The former translates a portion of new data to reconstruct the partial data distribution of the old domain. The latter uses an old model as a teacher to guide a new model. The experimental results on three datasets have shown that our work can effectively alleviate catastrophic forgetting by a combination of the two methods aforementioned.展开更多
基金funded by the National Natural Science Foundation of China(Grant No.52274048)Beijing Natural Science Foundation(Grant No.3222037)+1 种基金the CNPC 14th Five-Year Perspective Fundamental Research Project(Grant No.2021DJ2104)the Science Foundation of China University of Petroleum-Beijing(No.2462021YXZZ010).
文摘Recent advances in deep learning have expanded new possibilities for fluid flow simulation in petroleum reservoirs.However,the predominant approach in existing research is to train neural networks using high-fidelity numerical simulation data.This presents a significant challenge because the sole source of authentic wellbore production data for training is sparse.In response to this challenge,this work introduces a novel architecture called physics-informed neural network based on domain decomposition(PINN-DD),aiming to effectively utilize the sparse production data of wells for reservoir simulation with large-scale systems.To harness the capabilities of physics-informed neural networks(PINNs)in handling small-scale spatial-temporal domain while addressing the challenges of large-scale systems with sparse labeled data,the computational domain is divided into two distinct sub-domains:the well-containing and the well-free sub-domain.Moreover,the two sub-domains and the interface are rigorously constrained by the governing equations,data matching,and boundary conditions.The accuracy of the proposed method is evaluated on two problems,and its performance is compared against state-of-the-art PINNs through numerical analysis as a benchmark.The results demonstrate the superiority of PINN-DD in handling large-scale reservoir simulation with limited data and show its potential to outperform conventional PINNs in such scenarios.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
基金Supported by the National Natural Science Foun-dation of China (60173058 ,70372024)
文摘With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.
文摘A direction-of-arrival (DOA) estimation algorithm based on direct data domain (D3) approach is presented. This method can accuracy estimate DOA using one snapshot modified data, called the temporal and spatial two-dimensional vector reconstruction (TSR) method. The key idea is to apply the D3 approach which can extract the signal of given frequency but null out other frequency signals in temporal domain. Then the spatial vector reconstruction processing is used to estimate the angle of the spatial coherent signal source based on extract signal data. Compared with the common temporal and spatial processing approach, the TSR method has a lower computational load, higher real-time performance, robustness and angular accuracy of DOA. The proposed algorithm can be directly applied to the phased array radar of coherent pulses. Simulation results demonstrate the performance of the proposed technique.
文摘In non-homogeneous environment, traditional space-time adaptive processing doesn't effectively suppress interference and detect target, because the secondary data don' t exactly reflect the statistical characteristic of the range cell under test. A ravel methodology utilizing the direct data domain approach to space-time adaptive processing ( STAP ) in airbome radar non-homogeneous environments is presented. The deterministic least squares adaptive signal processing technique operates on a "snapshot-by-snapshot" basis to dethrone the adaptive adaptive weights for nulling interferences and estimating signal of interest (SOI). Furthermore, this approach eliminates the requirement for estimating the covariance through the data of neighboring range cell, which eliminates calculating the inverse of covariance, and can be implemented to operate in real-time. Simulation results illustrate the efficiency of interference suppression in non-homogeneous environment.
文摘The majority of big data analytics applied to transportation datasets suffer from being too domain-specific,that is,they draw conclusions for a dataset based on analytics on the same dataset.This makes models trained from one domain(e.g.taxi data)applies badly to a different domain(e.g.Uber data).To achieve accurate analyses on a new domain,substantial amounts of data must be available,which limits practical applications.To remedy this,we propose to use semi-supervised and active learning of big data to accomplish the domain adaptation task:Selectively choosing a small amount of datapoints from a new domain while achieving comparable performances to using all the datapoints.We choose the New York City(NYC)transportation data of taxi and Uber as our dataset,simulating different domains with 90%as the source data domain for training and the remaining 10%as the target data domain for evaluation.We propose semi-supervised and active learning strategies and apply it to the source domain for selecting datapoints.Experimental results show that our adaptation achieves a comparable performance of using all datapoints while using only a fraction of them,substantially reducing the amount of data required.Our approach has two major advantages:It can make accurate analytics and predictions when big datasets are not available,and even if big datasets are available,our approach chooses the most informative datapoints out of the dataset,making the process much more efficient without having to process huge amounts of data.
基金Supported by the Natural Science Foundation of Jiangsu Province (No.BK2004016).
文摘A robust phase-only Direct Data Domain Least Squares (D3LS) algorithm based on gen- eralized Rayleigh quotient optimization using hybrid Genetic Algorithm (GA) is presented in this letter. The optimization efficiency and computational speed are improved via the hybrid GA com- posed of standard GA and Nelder-Mead simplex algorithms. First, the objective function, with a form of generalized Rayleigh quotient, is derived via the standard D3LS algorithm. It is then taken as a fitness function and the unknown phases of all adaptive weights are taken as decision variables. Then, the nonlinear optimization is performed via the hybrid GA to obtain the optimized solution of phase-only adaptive weights. As a phase-only adaptive algorithm, the proposed algorithm is sim- pler than conventional algorithms when it comes to hardware implementation. Moreover, it proc- esses only a single snapshot data as opposed to forming sample covariance matrix and operating matrix inversion. Simulation results show that the proposed algorithm has a good signal recovery and interferences nulling performance, which are superior to that of the phase-only D3LS algorithm based on standard GA.
文摘With the widespread application and fast development of gas and oil pipeline network in China, the pipeline inspection technology has been used more extensively. The magnetic flux leakage (MFL) method has established itself as the most widely used in-line inspection technique for the evaluation of gas and oil pipelines. The MFL data obtained from seamless pipeline inspection is usually contaminated by the seamless pipe noise (SPN). SPN can in some cases completely mask MFL signals from certain type of defects, and therefore considerably reduces the detectability of the defect signals. In this paper, a new de-noising algorithm called wavelet domain adaptive filtering is proposed for removing the SPN contained in the MFL data. The new algorithm results from combining the wavelet transform with the adaptive filtering technique. Results from application of the proposed algorithm to the MFL data from field tests show that the proposed algorithm has good performance and considerably improves the detectability of the defect signals in the MFL data.
基金supported in part by the following funding agencies of China:National Natural Science Foundation under Grant 61602050 and U1534201National Key Research and Development Program of China under Grant 2016QY01W0200
文摘Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.
基金This work was supported was supported in part by the European Union under grant NeCST.
文摘In this paper, a model-free approach is presented to design an observer-based fault detection system of linear continuoustime systems based on input and output data in the time domain. The core of the approach is to directly identify parameters of the observer-based residual generator based on a numerically reliable data equation obtained by filtering and sampling the input and output signals.
文摘In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.
文摘This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names. Data cleaning is applied in order to increase the effectiveness of decision making support. Data cleaning is ap- plied in warehouses for detection and deletion of errors, discrepancy in data in order to improve their quality. For this purpose, fuzzy record comparison algorithms are for clearing of registration data of domain names reviewed in this work. Also, identification method of domain names registration data for data warehouse formation is proposed. Deci- sion making algorithms for identification of registration data are implemented in DRRacket and Python.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
文摘The Pan-African/Brasiliano orogenic belts are a part of numerous Neoproterozoic orogenic belts that belong to the long-lived orgenic cycle that distancing phase started at the Tonian around 1.0 Ga. The Tonian magmatism fairly documented in the Neoproterozoic belts of Borborema Province (NE-Brazil), seemed so far inexistent in the Central African Orogenic Belt (CAOB) although these two belts <span style="font-family:Verdana;">were</span><span style="font-family:Verdana;"> geologically correlated. Through the Lu-Hf geochronological analysis on zircon of tonalite, the present work, coupled with the previous data, suggests the existence of a Tonian age magmatism in the Central Cameroon Domain of the CAOB although the latter is much reworked. The Nguesseck tonalite outcrops in the northern part of the Mbé</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">Sassa-Mbersi region, in the northern edge of Central Cameroon Domain of the CAOB and in the Tcholliré</span><span style="font-family:Verdana;">-</span><span style="font-family:;" "=""><span style="font-family:Verdana;">Banyo shear zone (TBSZ). The Lu-Hf data obtain on the zircon grains of this tonalite reveal juvenile Hf</span><sub><span style="font-family:Verdana;">TDM</span></sub><span style="font-family:Verdana;"> age of ca. 1.0 Ga. This age, combined to the previous geochemical data suggest</span></span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> that the protholites of this tonalite would have extracted from the source during the distancing phase (rifting and oceanization) of the Pan-african/Brasiliano orogeny at the early Neoproterozoic.</span>
文摘Recent advances in computing,communications,digital storage technologies,and high-throughput data-acquisition technologies,make it possible to gather and store incredible volumes of data.It creates unprecedented opportunities for large-scale knowledge discovery from database.Data mining is an emerging area of computational intelligence that offers new theories,techniques,and tools for processing large volumes of data,such as data analysis,decision making,etc.There are many researchers working on designing efficient data mining techniques,methods,and algorithms.Unfortunately,most data mining researchers pay much attention to technique problems for developing data mining models and methods,while little to basic issues of data mining.In this paper,we will propose a new understanding for data mining,that is,domain-oriented data-driven data mining(3DM)model.Some data-driven data mining algorithms developed in our Lab are also presented to show its validity.
文摘Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.
文摘Recently, deep convolutional neural networks (DCNNs) have achieved remarkable results in image classification tasks. Despite convolutional networks’ great successes, their training process relies on a large amount of data prepared in advance, which is often challenging in real-world applications, such as streaming data and concept drift. For this reason, incremental learning (continual learning) has attracted increasing attention from scholars. However, incremental learning is associated with the challenge of catastrophic forgetting: the performance on previous tasks drastically degrades after learning a new task. In this paper, we propose a new strategy to alleviate catastrophic forgetting when neural networks are trained in continual domains. Specifically, two components are applied: data translation based on transfer learning and knowledge distillation. The former translates a portion of new data to reconstruct the partial data distribution of the old domain. The latter uses an old model as a teacher to guide a new model. The experimental results on three datasets have shown that our work can effectively alleviate catastrophic forgetting by a combination of the two methods aforementioned.