The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope ...The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope with the intermittency of ever-increasing renewable energy and ensure the security of the smart grid,state estimation,which serves as a basic tool for understanding the true states of a smart grid,should be performed with high frequency.More complete system state data are needed to support high-frequency state estimation.The data completeness problem for smart grid state estimation is therefore studied in this paper.The problem of improving data completeness by recovering highfrequency data from low-frequency data is formulated as a super resolution perception(SRP)problem in this paper.A novel machine-learning-based SRP approach is thereafter proposed.The proposed method,namely the Super Resolution Perception Net for State Estimation(SRPNSE),consists of three steps:feature extraction,information completion,and data reconstruction.Case studies have demonstrated the effectiveness and value of the proposed SRPNSE approach in recovering high-frequency data from low-frequency data for the state estimation.展开更多
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is comm...Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it.Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.展开更多
Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly...Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly estimated monitoring capability of local networks in China since 1970 and some outlying regions where the data is lack. Finally, we gave the regional distribution of the beginning years since which the data for different magnitude intervals are largely complete in the Chinese mainland.展开更多
In terms of the temporal-spatial distribution features of earthquakes, we study the completeness of historical data in North China where there is the most plenty historical data and with the longest record history by ...In terms of the temporal-spatial distribution features of earthquakes, we study the completeness of historical data in North China where there is the most plenty historical data and with the longest record history by some meth ods of analysis and comparison. The results are obtained for events with Ms≥4 are largely complete since 1484 in North China (except Huanghai sea region and remote districts, such as Nei Mongol Autonomous region), but quakes with Ms≥6 are largely complete since 1291 in the middle and lower reaches of the Yellow River.展开更多
Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking sto...Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking stock price as example, ranging from prices post-IPO to values before a company’s collapse, or instances where certain data points are missing due to stock suspension. In this paper, we propose a novel approach using Nonlinear Matrix Completion (NIMC) and Deep Matrix Completion (DIMC) to predict associations, and conduct experiment on financial data between dates and stocks. Our method leverages various types of stock observations to capture latent factors explaining the observed date-stock associations. Notably, our approach is nonlinear, making it suitable for datasets with nonlinear structures, such as the Russell 3000. Unlike traditional methods that may suffer from information loss, NIMC and DIMC maintain nearly complete information, especially in high-dimensional parameters. We compared our approach with state-of-the-art linear methods, including Inductive Matrix Completion, Nonlinear Inductive Matrix Completion, and Deep Inductive Matrix Completion. Our findings show that the nonlinear matrix completion method is particularly effective for handling nonlinear structured data, as exemplified by the Russell 3000. Additionally, we validate the information loss of the three methods across different dimensionalities.展开更多
The complex nonlinear and non-stationary features exhibited in hydrologic sequences make hydrological analysis and forecasting difficult.Currently,some hydrologists employ the complete ensemble empirical mode decompos...The complex nonlinear and non-stationary features exhibited in hydrologic sequences make hydrological analysis and forecasting difficult.Currently,some hydrologists employ the complete ensemble empirical mode decomposition with adaptive noise(CEEMDAN)method,a new time-frequency analysis method based on the empirical mode decomposition(EMD)algorithm,to decompose non-stationary raw data in order to obtain relatively stationary components for further study.However,the endpoint effect in CEEMDAN is often neglected,which can lead to decomposition errors that reduce the accuracy of the research results.In this study,we processed an original runoff sequence using the radial basis function neural network(RBFNN)technique to obtain the extension sequence before utilizing CEEMDAN decomposition.Then,we compared the decomposition results of the original sequence,RBFNN extension sequence,and standard sequence to investigate the influence of the endpoint effect and RBFNN extension on the CEEMDAN method.The results indicated that the RBFNN extension technique effectively reduced the error of medium and low frequency components caused by the endpoint effect.At both ends of the components,the extension sequence more accurately reflected the true fluctuation characteristics and variation trends.These advances are of great significance to the subsequent study of hydrology.Therefore,the CEEMDAN method,combined with an appropriate extension of the original runoff series,can more precisely determine multi-time scale characteristics,and provide a credible basis for the analysis of hydrologic time series and hydrological forecasting.展开更多
The quality of a multichannel audio signal may be reduced by missing data, which must be recovered before use. The data sets of multichannel audio can be quite large and have more than two axes of variation, such as c...The quality of a multichannel audio signal may be reduced by missing data, which must be recovered before use. The data sets of multichannel audio can be quite large and have more than two axes of variation, such as channel, frame, and feature. To recover missing audio data, we propose a low-rank tensor completion method that is a high-order generalization of matrix completion. First, a multichannel audio signal with missing data is modeled by a three-order tensor. Next, tensor completion is formulated as a convex optimization problem by defining the trace norm of the tensor, and then an augmented Lagrange multiplier method is used for solving the constrained optimization problem. Finally, the missing data is replaced by alternating iteration with a tensor computation. Experiments were conducted to evaluate the effectiveness on data of a 5.1-channel audio signal. The results show that the proposed method outperforms state-of-the-art methods. Moreover, subjective listening tests with MUSHRA(Multiple Stimuli with Hidden Reference and Anchor) indicate that better audio effects were obtained by tensor completion.展开更多
Randomness and fluctuations in wind power output may cause changes in important parameters(e.g.,grid frequency and voltage),which in turn affect the stable operation of a power system.However,owing to external factors...Randomness and fluctuations in wind power output may cause changes in important parameters(e.g.,grid frequency and voltage),which in turn affect the stable operation of a power system.However,owing to external factors(such as weather),there are often various anomalies in wind power data,such as missing numerical values and unreasonable data.This significantly affects the accuracy of wind power generation predictions and operational decisions.Therefore,developing and applying reliable wind power interpolation methods is important for promoting the sustainable development of the wind power industry.In this study,the causes of abnormal data in wind power generation were first analyzed from a practical perspective.Second,an improved complete ensemble empirical mode decomposition with adaptive noise(ICEEMDAN)method with a generative adversarial interpolation network(GAIN)network was proposed to preprocess wind power generation and interpolate missing wind power generation sub-components.Finally,a complete wind power generation time series was reconstructed.Compared to traditional methods,the proposed ICEEMDAN-GAIN combination interpolation model has a higher interpolation accuracy and can effectively reduce the error impact caused by wind power generation sequence fluctuations.展开更多
This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. T...This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. The maximum depth and the minimum depth of an individual CKT are equal and identical to data’s length. Insertion and deletion operations are defined; storage method and filtering algorithm are also designed for good compensation between efficiency and complexity. Applications to computer aided teaching of Chinese and protein selection show that an about 30% reduction of storage consumption and an over 60% reduction of computation may be easily obtained.展开更多
基金the Training Program of the Major Research Plan of the National Natural Science Foundation of China(91746118)the Shenzhen Municipal Science and Technology Innovation Committee Basic Research project(JCYJ20170410172224515)。
文摘The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope with the intermittency of ever-increasing renewable energy and ensure the security of the smart grid,state estimation,which serves as a basic tool for understanding the true states of a smart grid,should be performed with high frequency.More complete system state data are needed to support high-frequency state estimation.The data completeness problem for smart grid state estimation is therefore studied in this paper.The problem of improving data completeness by recovering highfrequency data from low-frequency data is formulated as a super resolution perception(SRP)problem in this paper.A novel machine-learning-based SRP approach is thereafter proposed.The proposed method,namely the Super Resolution Perception Net for State Estimation(SRPNSE),consists of three steps:feature extraction,information completion,and data reconstruction.Case studies have demonstrated the effectiveness and value of the proposed SRPNSE approach in recovering high-frequency data from low-frequency data for the state estimation.
基金The work was supported by the National Basic Research 973 Program of China under Grant No. 2011CB036202 and the National Natural Science Foundation of China under Grant No. 61532015.
文摘Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it.Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.
文摘Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly estimated monitoring capability of local networks in China since 1970 and some outlying regions where the data is lack. Finally, we gave the regional distribution of the beginning years since which the data for different magnitude intervals are largely complete in the Chinese mainland.
文摘In terms of the temporal-spatial distribution features of earthquakes, we study the completeness of historical data in North China where there is the most plenty historical data and with the longest record history by some meth ods of analysis and comparison. The results are obtained for events with Ms≥4 are largely complete since 1484 in North China (except Huanghai sea region and remote districts, such as Nei Mongol Autonomous region), but quakes with Ms≥6 are largely complete since 1291 in the middle and lower reaches of the Yellow River.
文摘Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking stock price as example, ranging from prices post-IPO to values before a company’s collapse, or instances where certain data points are missing due to stock suspension. In this paper, we propose a novel approach using Nonlinear Matrix Completion (NIMC) and Deep Matrix Completion (DIMC) to predict associations, and conduct experiment on financial data between dates and stocks. Our method leverages various types of stock observations to capture latent factors explaining the observed date-stock associations. Notably, our approach is nonlinear, making it suitable for datasets with nonlinear structures, such as the Russell 3000. Unlike traditional methods that may suffer from information loss, NIMC and DIMC maintain nearly complete information, especially in high-dimensional parameters. We compared our approach with state-of-the-art linear methods, including Inductive Matrix Completion, Nonlinear Inductive Matrix Completion, and Deep Inductive Matrix Completion. Our findings show that the nonlinear matrix completion method is particularly effective for handling nonlinear structured data, as exemplified by the Russell 3000. Additionally, we validate the information loss of the three methods across different dimensionalities.
基金supported by the National Key R&D Program of China(Grant No.2018YFC0406501)Outstanding Young Talent Research Fund of Zhengzhou Uni-versity(Grant No.1521323002)+2 种基金Program for Innovative Talents(in Science and Technology)at University of Henan Province(Grant No.18HASTIT014)State Key Laboratory of Hydraulic Engineering Simulation and Safety,Tianjin University(Grant No.HESS-1717)Foundation for University Youth Key Teacher of Henan Province(Grant No.2017GGJS006).
文摘The complex nonlinear and non-stationary features exhibited in hydrologic sequences make hydrological analysis and forecasting difficult.Currently,some hydrologists employ the complete ensemble empirical mode decomposition with adaptive noise(CEEMDAN)method,a new time-frequency analysis method based on the empirical mode decomposition(EMD)algorithm,to decompose non-stationary raw data in order to obtain relatively stationary components for further study.However,the endpoint effect in CEEMDAN is often neglected,which can lead to decomposition errors that reduce the accuracy of the research results.In this study,we processed an original runoff sequence using the radial basis function neural network(RBFNN)technique to obtain the extension sequence before utilizing CEEMDAN decomposition.Then,we compared the decomposition results of the original sequence,RBFNN extension sequence,and standard sequence to investigate the influence of the endpoint effect and RBFNN extension on the CEEMDAN method.The results indicated that the RBFNN extension technique effectively reduced the error of medium and low frequency components caused by the endpoint effect.At both ends of the components,the extension sequence more accurately reflected the true fluctuation characteristics and variation trends.These advances are of great significance to the subsequent study of hydrology.Therefore,the CEEMDAN method,combined with an appropriate extension of the original runoff series,can more precisely determine multi-time scale characteristics,and provide a credible basis for the analysis of hydrologic time series and hydrological forecasting.
基金partially supported by the National Natural Science Foundation of China under Grants No. 61571044, No.61620106002, No.61473041, No.11590772, No.61640012Inner Mongolia Natural Science Foundation under Grants No. 2017MS(LH)0602
文摘The quality of a multichannel audio signal may be reduced by missing data, which must be recovered before use. The data sets of multichannel audio can be quite large and have more than two axes of variation, such as channel, frame, and feature. To recover missing audio data, we propose a low-rank tensor completion method that is a high-order generalization of matrix completion. First, a multichannel audio signal with missing data is modeled by a three-order tensor. Next, tensor completion is formulated as a convex optimization problem by defining the trace norm of the tensor, and then an augmented Lagrange multiplier method is used for solving the constrained optimization problem. Finally, the missing data is replaced by alternating iteration with a tensor computation. Experiments were conducted to evaluate the effectiveness on data of a 5.1-channel audio signal. The results show that the proposed method outperforms state-of-the-art methods. Moreover, subjective listening tests with MUSHRA(Multiple Stimuli with Hidden Reference and Anchor) indicate that better audio effects were obtained by tensor completion.
基金We gratefully acknowledge the support of National Natural Science Foundation of China(NSFC)(Grant No.51977133&Grant No.U2066209).
文摘Randomness and fluctuations in wind power output may cause changes in important parameters(e.g.,grid frequency and voltage),which in turn affect the stable operation of a power system.However,owing to external factors(such as weather),there are often various anomalies in wind power data,such as missing numerical values and unreasonable data.This significantly affects the accuracy of wind power generation predictions and operational decisions.Therefore,developing and applying reliable wind power interpolation methods is important for promoting the sustainable development of the wind power industry.In this study,the causes of abnormal data in wind power generation were first analyzed from a practical perspective.Second,an improved complete ensemble empirical mode decomposition with adaptive noise(ICEEMDAN)method with a generative adversarial interpolation network(GAIN)network was proposed to preprocess wind power generation and interpolate missing wind power generation sub-components.Finally,a complete wind power generation time series was reconstructed.Compared to traditional methods,the proposed ICEEMDAN-GAIN combination interpolation model has a higher interpolation accuracy and can effectively reduce the error impact caused by wind power generation sequence fluctuations.
文摘This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. The maximum depth and the minimum depth of an individual CKT are equal and identical to data’s length. Insertion and deletion operations are defined; storage method and filtering algorithm are also designed for good compensation between efficiency and complexity. Applications to computer aided teaching of Chinese and protein selection show that an about 30% reduction of storage consumption and an over 60% reduction of computation may be easily obtained.