To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s...To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s pH value is 6.0–8.5.The sewage treatment plant uses some data in the sewage treatment process to monitor and predict whether wastewater’s pH value will exceed the standard.This paper aims to study the deep learning prediction model of wastewater’s pH.Firstly,the research uses the random forest method to select the data features and then,based on the sliding window,convert the data set into a time series which is the input of the deep learning training model.Secondly,by analyzing and comparing relevant references,this paper believes that the CNN(Convolutional Neural Network)model is better at nonlinear data modeling and constructs a CNN model including the convolution and pooling layers.After alternating the combination of the convolutional layer and pooling layer,all features are integrated into a full-connected neural network.Thirdly,the number of input samples of the CNN model directly affects the prediction effect of the model.Therefore,this paper adopts the sliding window method to study the optimal size.Many experimental results show that the optimal prediction model can be obtained when alternating six convolutional layers and three pooling layers.The last full-connection layer contains two layers and 64 neurons per layer.The sliding window size selects as 12.Finally,the research has carried out data prediction based on the optimal CNN deep learning model.The predicted pH of the sewage is between 7.2 and 8.6 in this paper.The result is applied in the monitoring system platform of the“Intelligent operation and maintenance platform of the reclaimed water plant.”展开更多
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru...This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.展开更多
Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the ta...Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the target historical fault-free reference data as the template which is similar to the current snapshot data.The size of sliding window is usually given according to empirical values,while the influence of different sizes of sliding windows on fault detection of an air conditioning system is not further studied.The air conditioning system is a dynamic response process,and the operating parameters change with the change of the load,while the response of the controller is delayed.In a variable air volume(VAV)air conditioning system controlled by the total air volume method,in order to ensure sufficient response time,30 data points are selected first,and then their multiples are selected.Three different sizes of sliding windows with 30,60 and 90 data points are applied to compare the fault detection effect in this paper.The results show that if the size of the sliding window is 60 data points,the average fault-free detection ratio is 80.17%in fault-free testing days,and the average fault detection ratio is 88.47%in faulty testing days.展开更多
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg...How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.展开更多
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream...Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.展开更多
Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the prem...Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query,sharing,and analysis.The dynamic data release is a study with a wide range of current industry needs.However,the amount of data varies considerably over different periods.Unreasonable data processing will result in the risk of users’information leakage and unavailability of the data.Therefore,we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM(DPHP-DL),which can improve data availability on the premise of guaranteeing data privacy.DPHP-DL is integrated by DSW-LSTM and DPHK+.DSW-LSTM updates the size of sliding windows based on data value prediction via long shortterm memory(LSTM)networks,which evenly divides the data stream into several windows.DPHK+heuristically publishes non-isometric histograms based on k-mean++clustering of automatically obtaining the optimal K,so as to achieve differential privacy histogram publishing of dynamic data.Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL.展开更多
Human motion prediction is a critical issue in human-robot collaboration(HRC)tasks.In order to reduce the local error caused by the limitation of the capture range and sampling frequency of the depth sensor,a hybrid h...Human motion prediction is a critical issue in human-robot collaboration(HRC)tasks.In order to reduce the local error caused by the limitation of the capture range and sampling frequency of the depth sensor,a hybrid human motion prediction algorithm,optimized sliding window polynomial fitting and recursive least squares(OSWPF-RLS)was proposed.The OSWPF-RLS algorithm uses the human body joint data obtained under the HRC task as input,and uses recursive least squares(RLS)to predict the human movement trajectories within the time window.Then,the optimized sliding window polynomial fitting(OSWPF)is used to calculate the multi-step prediction value,and the increment of multi-step prediction value was appropriately constrained.Experimental results show that compared with the existing benchmark algorithms,the OSWPF-RLS algorithm improved the multi-step prediction accuracy of human motion and enhanced the ability to respond to different human movements.展开更多
We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version. It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates; among them, we ...We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version. It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates; among them, we added our modified versions of several widely used methods as the gamma series including y-NG, y-LWL, ),-MLWL, y-LPB, y-MLPB, y-YN and y-MYN, which have been demonstrated to perform better under certain conditions than their original forms and are not implemented in the previous version. The package is readily used for the identification of positively selected sites based on a sliding window across the sequences of interests in 5' to 3' direction of protein-coding sequences, and have improved the overall performance on sequence analysis for evolution studies. A toolbox, including C++ and Java source code and executable files on both Windows and Linux platforms together with a user instruction, is downloadable from the website for academic purpose at https://sourceforge.net/projects/kakscalculator2/.展开更多
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ...Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.展开更多
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a...Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.展开更多
Through improving the redundant data filtering of unreliable data filter for radio frequency identification(RFID) with sliding-window,a data filter which integrates self-adaptive sliding-window and Euclidean distanc...Through improving the redundant data filtering of unreliable data filter for radio frequency identification(RFID) with sliding-window,a data filter which integrates self-adaptive sliding-window and Euclidean distance is proposed.The input data required being filtered have been shunt by considering a large number of redundant data existing in the unreliable data for RFID and the redundant data in RFID are the main filtering object with utilizing the filter based on Euclidean distance.The comparison between the results from the method proposed in this paper and previous research shows that it can improve the accuracy of the RFID for unreliable data filtering and largely reduce the redundant reading rate.展开更多
Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this ...Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.展开更多
We extract some physical and chemical features re-lated to the occurrence of single nucleotide polymorphism (SNP) from three groups of sliding windows around SNP site,and then make the predictions about accuracy by ...We extract some physical and chemical features re-lated to the occurrence of single nucleotide polymorphism (SNP) from three groups of sliding windows around SNP site,and then make the predictions about accuracy by using radial basis function (RBF) networks. The result of the forward sliding windows sug-gests that the accuracies and Matthews correlation coefficient (MCC values) ascend with the increasing of length of sliding windows. The accuracies range from 73.27 % to 80.69 %,and MCC values range from 0.465 to 0.614. The backward sliding windows and the sliding windows with fixed length three are de-signed to find the crucial sites related to SNP. The results imply that the occurrence possibility of SNP relies heavily on the above physical and chemical features of sites which are at a distance around 20 bases from the SNP site. Compared with the support vector machine (SVM),our RBF network approach has achieved more satisfactory results.展开更多
Regarding the performance of traditional endpoint detection algorithms degrades as the environment noise level increases, a recursive calculating algorithm for higher-order cu- mulants over a sliding window is propose...Regarding the performance of traditional endpoint detection algorithms degrades as the environment noise level increases, a recursive calculating algorithm for higher-order cu- mulants over a sliding window is proposed. Then it is applied to the speech endpoint detection. Furthermore, endpoint detection is carried out with the feature of energy. Experimental results show that both the computational efficiency and the robustness against noise of the proposed algorithm are improved remarkably compared with traditional algorithm. The average prob- ability of correct point detection (Pc-point) of the proposed voice activity detection (VAD) is 6.07% higher than that of G.729b VAD in different noisy at different signal-noise ratios (SNRs) environments.展开更多
In this paper, a novel Bayesian-Gaussian neural network (BGNN) is proposed and applied to on-line modeling of a hydraulic turbine system (HTS). The new BGNN takes account of the complex nonlinear characteristics of HT...In this paper, a novel Bayesian-Gaussian neural network (BGNN) is proposed and applied to on-line modeling of a hydraulic turbine system (HTS). The new BGNN takes account of the complex nonlinear characteristics of HTS. Two redefined training procedures of the BGNN include the off-line training of the threshold matrix parameters, optimized by swarm optimiza- tion algorithms, and the on-line BGNN predictive application driven by the sliding window data method. The characteristics models of an HTS are identified using the new BGNN method and simulation results are presented which show the effectiveness of the BGNN in addressing modeling problems of HTS.展开更多
Precipitation is the most discontinuous atmospheric parameter because of its temporal and spatial variability. Precipitation observations at automatic weather stations(AWSs) show different patterns over different ti...Precipitation is the most discontinuous atmospheric parameter because of its temporal and spatial variability. Precipitation observations at automatic weather stations(AWSs) show different patterns over different time periods. This paper aims to reconstruct missing data by finding the time periods when precipitation patterns are similar, with a method called the intermittent sliding window period(ISWP) technique—a novel approach to reconstructing the majority of non-continuous missing real-time precipitation data. The ISWP technique is applied to a 1-yr precipitation dataset(January 2015 to January 2016), with a temporal resolution of 1 h, collected at 11 AWSs run by the Indian Meteorological Department in the capital region of Delhi. The acquired dataset has missing precipitation data amounting to 13.66%, of which 90.6% are reconstructed successfully. Furthermore, some traditional estimation algorithms are applied to the reconstructed dataset to estimate the remaining missing values on an hourly basis. The results show that the interpolation of the reconstructed dataset using the ISWP technique exhibits high quality compared with interpolation of the raw dataset. By adopting the ISWP technique, the root-mean-square errors(RMSEs)in the estimation of missing rainfall data—based on the arithmetic mean, multiple linear regression, linear regression,and moving average methods—are reduced by 4.2%, 55.47%, 19.44%, and 9.64%, respectively. However, adopting the ISWP technique with the inverse distance weighted method increases the RMSE by 0.07%, due to the fact that the reconstructed data add a more diverse relation to its neighboring AWSs.展开更多
Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent ...Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent problems to be solved.Herein,we present a practical and robust file-level flow-control approach,called the Unlimited Sliding-Window(USW),by referring to the classic flow-control method in the TCP protocol.Based on the USW and the Next Generation Archive System(NGAS)developed for the Murchison Widefield Array telescope,we further implemented an enhanced archive system(ENGAS)using ZeroMQ middleware.The ENGAS substantially improves the transfer performance and ensures the integrity of transferred files.In the tests,the ENGAS is approximately three to twelve times faster than the NGAS and can fully utilize the bandwidth of network links.Thus,for archiving radio observation data,the ENGAS reduces the communication time,improves the bandwidth utilization,and solves the remote synchronous archiving of data from observatories such as Mingantu spectral radioheliograph.It also provides a better reference for the future construction of the Square Kilometer Array(SKA)Science Regional Center.展开更多
Rain and snow seriously degrade outdoor video quality.In this work,a primary-secondary background model for removal of rain and snow is built.First,we analyze video noise and use a sliding window sequence principal co...Rain and snow seriously degrade outdoor video quality.In this work,a primary-secondary background model for removal of rain and snow is built.First,we analyze video noise and use a sliding window sequence principal component analysis de-nosing algorithm to reduce white noise in the video.Next,we apply the Gaussian mixture model(GMM)to model the video and segment all foreground objects primarily.After that,we calculate von Mises distribution of the velocity vectors and ratio of the overlapped region with referring to the result of the primary segmentation and extract the interesting object.Finally,rain and snow streaks are inpainted using the background to improve the quality of the video data.Experiments show that the proposed method can effectively suppress noise and extract interesting targets.展开更多
In order to improve the efficiency of the fingerprint core location algorithm, a fingerprint core location method using sliding window on the basis of core location algorithm with the complex filter was proposed. The ...In order to improve the efficiency of the fingerprint core location algorithm, a fingerprint core location method using sliding window on the basis of core location algorithm with the complex filter was proposed. The local region of the fingerprint image was extracted by a fixed-size window sliding in the region of the fingerprint image, and the selected local region by window as the calculation object is used to detect the core. The experiment results show that the method cannot only effectively detect fingerprint core, but also improve the efficiency of the detection algorithm comparing with the global fingerprint core location detection algorithm.展开更多
The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re...The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.展开更多
基金This research was funded by the National Key R&D Program of China(No.2018YFB2100603)the Key R&D Program of Hubei Province(No.2022BAA048)+2 种基金the National Natural Science Foundation of China program(No.41890822)the Open Fund of National Engineering Research Centre for Geographic Information System,China University of Geosciences,Wuhan 430074,China(No.2022KFJJ07)The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Centre of Wuhan University.
文摘To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s pH value is 6.0–8.5.The sewage treatment plant uses some data in the sewage treatment process to monitor and predict whether wastewater’s pH value will exceed the standard.This paper aims to study the deep learning prediction model of wastewater’s pH.Firstly,the research uses the random forest method to select the data features and then,based on the sliding window,convert the data set into a time series which is the input of the deep learning training model.Secondly,by analyzing and comparing relevant references,this paper believes that the CNN(Convolutional Neural Network)model is better at nonlinear data modeling and constructs a CNN model including the convolution and pooling layers.After alternating the combination of the convolutional layer and pooling layer,all features are integrated into a full-connected neural network.Thirdly,the number of input samples of the CNN model directly affects the prediction effect of the model.Therefore,this paper adopts the sliding window method to study the optimal size.Many experimental results show that the optimal prediction model can be obtained when alternating six convolutional layers and three pooling layers.The last full-connection layer contains two layers and 64 neurons per layer.The sliding window size selects as 12.Finally,the research has carried out data prediction based on the optimal CNN deep learning model.The predicted pH of the sewage is between 7.2 and 8.6 in this paper.The result is applied in the monitoring system platform of the“Intelligent operation and maintenance platform of the reclaimed water plant.”
基金Supported by the National Natural Science Foun-dation of China (60403027)
文摘This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.
基金Fundamental Research Funds for the Central Universities of Ministry of Education of China。
文摘Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the target historical fault-free reference data as the template which is similar to the current snapshot data.The size of sliding window is usually given according to empirical values,while the influence of different sizes of sliding windows on fault detection of an air conditioning system is not further studied.The air conditioning system is a dynamic response process,and the operating parameters change with the change of the load,while the response of the controller is delayed.In a variable air volume(VAV)air conditioning system controlled by the total air volume method,in order to ensure sufficient response time,30 data points are selected first,and then their multiples are selected.Three different sizes of sliding windows with 30,60 and 90 data points are applied to compare the fault detection effect in this paper.The results show that if the size of the sliding window is 60 data points,the average fault-free detection ratio is 80.17%in fault-free testing days,and the average fault detection ratio is 88.47%in faulty testing days.
基金Supported by the National Natural Science Foun-dation of China (60573089) the National 985 Project Fundation(985-2-DB-Y01)
文摘How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.
基金Supported by the National Natural Science Foun-dation of China (60473073)
文摘Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.
基金supported by the National Nature Science Foundation of China(Grant Nos.91546108,and 71490725)the AnhuiProvincial Scienceand Technology Major Projects(201903a05020020)+2 种基金the Anhui Provincial Natural Science Foundation(1908085QG298)the Fundamental Research Funds for the Central Universities(JZ2019HGTA0053,JZ2019 HGBZ0128)the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-making,Ministry of Education,China.
文摘Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query,sharing,and analysis.The dynamic data release is a study with a wide range of current industry needs.However,the amount of data varies considerably over different periods.Unreasonable data processing will result in the risk of users’information leakage and unavailability of the data.Therefore,we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM(DPHP-DL),which can improve data availability on the premise of guaranteeing data privacy.DPHP-DL is integrated by DSW-LSTM and DPHK+.DSW-LSTM updates the size of sliding windows based on data value prediction via long shortterm memory(LSTM)networks,which evenly divides the data stream into several windows.DPHK+heuristically publishes non-isometric histograms based on k-mean++clustering of automatically obtaining the optimal K,so as to achieve differential privacy histogram publishing of dynamic data.Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL.
基金supported by the National Natural Science Foundation of China(61701270)the Young Doctor Cooperation Foundation of Qilu University of Technology(Shandong Academy of Sciences)(2017BSHZ008)。
文摘Human motion prediction is a critical issue in human-robot collaboration(HRC)tasks.In order to reduce the local error caused by the limitation of the capture range and sampling frequency of the depth sensor,a hybrid human motion prediction algorithm,optimized sliding window polynomial fitting and recursive least squares(OSWPF-RLS)was proposed.The OSWPF-RLS algorithm uses the human body joint data obtained under the HRC task as input,and uses recursive least squares(RLS)to predict the human movement trajectories within the time window.Then,the optimized sliding window polynomial fitting(OSWPF)is used to calculate the multi-step prediction value,and the increment of multi-step prediction value was appropriately constrained.Experimental results show that compared with the existing benchmark algorithms,the OSWPF-RLS algorithm improved the multi-step prediction accuracy of human motion and enhanced the ability to respond to different human movements.
基金funded by the National Basic Research Program of China (973 Program) to JY (Grant No.2006CB910404)
文摘We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version. It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates; among them, we added our modified versions of several widely used methods as the gamma series including y-NG, y-LWL, ),-MLWL, y-LPB, y-MLPB, y-YN and y-MYN, which have been demonstrated to perform better under certain conditions than their original forms and are not implemented in the previous version. The package is readily used for the identification of positively selected sites based on a sliding window across the sequences of interests in 5' to 3' direction of protein-coding sequences, and have improved the overall performance on sequence analysis for evolution studies. A toolbox, including C++ and Java source code and executable files on both Windows and Linux platforms together with a user instruction, is downloadable from the website for academic purpose at https://sourceforge.net/projects/kakscalculator2/.
基金supported by the National Natural Science Foundation of China under Grant Nos. 60973020, 60828004,and 60933001the Program for New Century Excellent Talents in University of China under Grant No. NCET-06-0290the Fundamental Research Funds for the Central Universities under Grant No. N090504004
文摘Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
基金supported by the "Hundred Talents Program" of CAS and the National Natural Science Foundation of China under Grant No. 60772034.
文摘Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.
基金supported by the foundation of Science and Technology Commission of Shanghai Municipality (Grant No.13521103902)
文摘Through improving the redundant data filtering of unreliable data filter for radio frequency identification(RFID) with sliding-window,a data filter which integrates self-adaptive sliding-window and Euclidean distance is proposed.The input data required being filtered have been shunt by considering a large number of redundant data existing in the unreliable data for RFID and the redundant data in RFID are the main filtering object with utilizing the filter based on Euclidean distance.The comparison between the results from the method proposed in this paper and previous research shows that it can improve the accuracy of the RFID for unreliable data filtering and largely reduce the redundant reading rate.
基金This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.
文摘Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
基金Supported by Discipline-Crossing Research Foundation of Huazhong Agricultural University(2008XKJC006)the Fundamental Research Funds for the Central Universities of China
文摘We extract some physical and chemical features re-lated to the occurrence of single nucleotide polymorphism (SNP) from three groups of sliding windows around SNP site,and then make the predictions about accuracy by using radial basis function (RBF) networks. The result of the forward sliding windows sug-gests that the accuracies and Matthews correlation coefficient (MCC values) ascend with the increasing of length of sliding windows. The accuracies range from 73.27 % to 80.69 %,and MCC values range from 0.465 to 0.614. The backward sliding windows and the sliding windows with fixed length three are de-signed to find the crucial sites related to SNP. The results imply that the occurrence possibility of SNP relies heavily on the above physical and chemical features of sites which are at a distance around 20 bases from the SNP site. Compared with the support vector machine (SVM),our RBF network approach has achieved more satisfactory results.
基金supported by the National Natural Science Eoundation of China(61271352)
文摘Regarding the performance of traditional endpoint detection algorithms degrades as the environment noise level increases, a recursive calculating algorithm for higher-order cu- mulants over a sliding window is proposed. Then it is applied to the speech endpoint detection. Furthermore, endpoint detection is carried out with the feature of energy. Experimental results show that both the computational efficiency and the robustness against noise of the proposed algorithm are improved remarkably compared with traditional algorithm. The average prob- ability of correct point detection (Pc-point) of the proposed voice activity detection (VAD) is 6.07% higher than that of G.729b VAD in different noisy at different signal-noise ratios (SNRs) environments.
基金Project (Nos. 60704024 and 60772107) supported by the National Natural Science Foundation of China
文摘In this paper, a novel Bayesian-Gaussian neural network (BGNN) is proposed and applied to on-line modeling of a hydraulic turbine system (HTS). The new BGNN takes account of the complex nonlinear characteristics of HTS. Two redefined training procedures of the BGNN include the off-line training of the threshold matrix parameters, optimized by swarm optimiza- tion algorithms, and the on-line BGNN predictive application driven by the sliding window data method. The characteristics models of an HTS are identified using the new BGNN method and simulation results are presented which show the effectiveness of the BGNN in addressing modeling problems of HTS.
文摘Precipitation is the most discontinuous atmospheric parameter because of its temporal and spatial variability. Precipitation observations at automatic weather stations(AWSs) show different patterns over different time periods. This paper aims to reconstruct missing data by finding the time periods when precipitation patterns are similar, with a method called the intermittent sliding window period(ISWP) technique—a novel approach to reconstructing the majority of non-continuous missing real-time precipitation data. The ISWP technique is applied to a 1-yr precipitation dataset(January 2015 to January 2016), with a temporal resolution of 1 h, collected at 11 AWSs run by the Indian Meteorological Department in the capital region of Delhi. The acquired dataset has missing precipitation data amounting to 13.66%, of which 90.6% are reconstructed successfully. Furthermore, some traditional estimation algorithms are applied to the reconstructed dataset to estimate the remaining missing values on an hourly basis. The results show that the interpolation of the reconstructed dataset using the ISWP technique exhibits high quality compared with interpolation of the raw dataset. By adopting the ISWP technique, the root-mean-square errors(RMSEs)in the estimation of missing rainfall data—based on the arithmetic mean, multiple linear regression, linear regression,and moving average methods—are reduced by 4.2%, 55.47%, 19.44%, and 9.64%, respectively. However, adopting the ISWP technique with the inverse distance weighted method increases the RMSE by 0.07%, due to the fact that the reconstructed data add a more diverse relation to its neighboring AWSs.
基金supported by the National Key Research and Development Program of China(2020SKA0110300)the Joint Research Fund in Astronomy(U1831204 and U1931141)under cooperative agreement between the National Natural Science Foundation of China(NSFC)+7 种基金the Chinese Academy of Sciences(CAS)(NSFC,No.11903009)the Funds for International Cooperation and Exchange of the NSFC(11961141001)Yunnan Key Research and Development Program(2018IA054)The Key Science and Technology Program of Henan Province(Nos.202102210152,212102210611 and 202102210125)the Research and Cultivation Fund Project of Anyang Normal University(AYNUKPY-2019-24 and AYNUKPY-2020-25)supported by Astronomical Big Data Joint Research Centerco-founded by the National Astronomical ObservatoriesChinese Academy of Sciences and Alibaba Cloud。
文摘Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent problems to be solved.Herein,we present a practical and robust file-level flow-control approach,called the Unlimited Sliding-Window(USW),by referring to the classic flow-control method in the TCP protocol.Based on the USW and the Next Generation Archive System(NGAS)developed for the Murchison Widefield Array telescope,we further implemented an enhanced archive system(ENGAS)using ZeroMQ middleware.The ENGAS substantially improves the transfer performance and ensures the integrity of transferred files.In the tests,the ENGAS is approximately three to twelve times faster than the NGAS and can fully utilize the bandwidth of network links.Thus,for archiving radio observation data,the ENGAS reduces the communication time,improves the bandwidth utilization,and solves the remote synchronous archiving of data from observatories such as Mingantu spectral radioheliograph.It also provides a better reference for the future construction of the Square Kilometer Array(SKA)Science Regional Center.
基金supported by the National Natural Science Foundation of China(Grant No.60702032)the Natural Science Foundation of Heilongjiang Province(No.F201021)the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology(No.HIT.NSRIF.2008.63).
文摘Rain and snow seriously degrade outdoor video quality.In this work,a primary-secondary background model for removal of rain and snow is built.First,we analyze video noise and use a sliding window sequence principal component analysis de-nosing algorithm to reduce white noise in the video.Next,we apply the Gaussian mixture model(GMM)to model the video and segment all foreground objects primarily.After that,we calculate von Mises distribution of the velocity vectors and ratio of the overlapped region with referring to the result of the primary segmentation and extract the interesting object.Finally,rain and snow streaks are inpainted using the background to improve the quality of the video data.Experiments show that the proposed method can effectively suppress noise and extract interesting targets.
基金Supported in part by the National Natural Science Foundation of China(61301091)the Natural Science Basic Research Plan in Shaanxi Province of China(2015JQ6262)+1 种基金the Open Foundation of State Key Laboratory of Information Security(2015-MS-14)the New Star Team of Xi’an University of Posts&Telecommunications
文摘In order to improve the efficiency of the fingerprint core location algorithm, a fingerprint core location method using sliding window on the basis of core location algorithm with the complex filter was proposed. The local region of the fingerprint image was extracted by a fixed-size window sliding in the region of the fingerprint image, and the selected local region by window as the calculation object is used to detect the core. The experiment results show that the method cannot only effectively detect fingerprint core, but also improve the efficiency of the detection algorithm comparing with the global fingerprint core location detection algorithm.
基金This work is supported by EIAS(Emerging Intelligent Autonomous Systems)Data Science Lab,Prince Sultan University,Kingdom of Saudi Arabia,by paying the APC.
文摘The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.