To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s...To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s pH value is 6.0–8.5.The sewage treatment plant uses some data in the sewage treatment process to monitor and predict whether wastewater’s pH value will exceed the standard.This paper aims to study the deep learning prediction model of wastewater’s pH.Firstly,the research uses the random forest method to select the data features and then,based on the sliding window,convert the data set into a time series which is the input of the deep learning training model.Secondly,by analyzing and comparing relevant references,this paper believes that the CNN(Convolutional Neural Network)model is better at nonlinear data modeling and constructs a CNN model including the convolution and pooling layers.After alternating the combination of the convolutional layer and pooling layer,all features are integrated into a full-connected neural network.Thirdly,the number of input samples of the CNN model directly affects the prediction effect of the model.Therefore,this paper adopts the sliding window method to study the optimal size.Many experimental results show that the optimal prediction model can be obtained when alternating six convolutional layers and three pooling layers.The last full-connection layer contains two layers and 64 neurons per layer.The sliding window size selects as 12.Finally,the research has carried out data prediction based on the optimal CNN deep learning model.The predicted pH of the sewage is between 7.2 and 8.6 in this paper.The result is applied in the monitoring system platform of the“Intelligent operation and maintenance platform of the reclaimed water plant.”展开更多
Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the ta...Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the target historical fault-free reference data as the template which is similar to the current snapshot data.The size of sliding window is usually given according to empirical values,while the influence of different sizes of sliding windows on fault detection of an air conditioning system is not further studied.The air conditioning system is a dynamic response process,and the operating parameters change with the change of the load,while the response of the controller is delayed.In a variable air volume(VAV)air conditioning system controlled by the total air volume method,in order to ensure sufficient response time,30 data points are selected first,and then their multiples are selected.Three different sizes of sliding windows with 30,60 and 90 data points are applied to compare the fault detection effect in this paper.The results show that if the size of the sliding window is 60 data points,the average fault-free detection ratio is 80.17%in fault-free testing days,and the average fault detection ratio is 88.47%in faulty testing days.展开更多
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream...Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.展开更多
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ...Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.展开更多
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a...Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.展开更多
The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re...The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.展开更多
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru...This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.展开更多
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg...How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.展开更多
Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent ...Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent problems to be solved.Herein,we present a practical and robust file-level flow-control approach,called the Unlimited Sliding-Window(USW),by referring to the classic flow-control method in the TCP protocol.Based on the USW and the Next Generation Archive System(NGAS)developed for the Murchison Widefield Array telescope,we further implemented an enhanced archive system(ENGAS)using ZeroMQ middleware.The ENGAS substantially improves the transfer performance and ensures the integrity of transferred files.In the tests,the ENGAS is approximately three to twelve times faster than the NGAS and can fully utilize the bandwidth of network links.Thus,for archiving radio observation data,the ENGAS reduces the communication time,improves the bandwidth utilization,and solves the remote synchronous archiving of data from observatories such as Mingantu spectral radioheliograph.It also provides a better reference for the future construction of the Square Kilometer Array(SKA)Science Regional Center.展开更多
A lane-level intersection map is a cornerstone in high-definition(HD) traffic network maps for autonomous driving and high-precision intelligent transportation systems applications such as traffic management and contr...A lane-level intersection map is a cornerstone in high-definition(HD) traffic network maps for autonomous driving and high-precision intelligent transportation systems applications such as traffic management and control, and traffic accident evaluation and prevention. Mapping an HD intersection is time-consuming, labor-intensive, and expensive with conventional methods. In this paper, we used a low-channel roadside light detection and range sensor(LiDAR) to automatically and dynamically generate a lane-level intersection, including the signal phases, geometry, layout, and lane directions. First, a mathematical model was proposed to describe the topology and detail of a lane-level intersection. Second, continuous and discontinuous traffic object trajectories were extracted to identify the signal phases and times. Third, the layout, geometry, and lane direction were identified using the convex hull detection algorithm for trajectories. Fourth, a sliding window algorithm was presented to detect the lane marking and extract the lane, and the virtual lane connecting the inbound and outbound of the intersection were generated using the vehicle trajectories within the intersection and considering the traffic rules. In the field experiment, the mean absolute estimation error is 2 s for signal phase and time identification. The lane marking identification Precision and Recall are96% and 94.12%, respectively. Compared with the satellite-based,MMS-based, and crowdsourcing-based lane mapping methods,the average lane location deviation is 0.2 m and the update period is less than one hour by the proposed method with low-channel roadside LiDAR.展开更多
For enhancing performances and increasing functions of PD radar, High PRF, medium PRF and low PRF are commonly applied into system ambiguity appeared in range and velocity in some PRF. Based on clustering, a slidin...For enhancing performances and increasing functions of PD radar, High PRF, medium PRF and low PRF are commonly applied into system ambiguity appeared in range and velocity in some PRF. Based on clustering, a sliding window correlator algorithm for resolving the radar object ambiguity in range and velocity is described. Slide window algorithm is a searching algorithm. The probability of ambiguity resolution for targets and the computational efficiency are discussed. The relations between the probability of ambiguity resolution of this algorithm and PRF, the range of interest, and the width of sliding window are analyzed. Simulational results are also given.展开更多
A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patte...A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.展开更多
Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans...Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans-genes that will complement the novel δendotoxins in future. Fewer structural and functional relationships of Vip proteins are known in comparison with those of δ-endotoxins. In this study, both the maximum-likelihood methods and the maximum parsimony based sliding window analysis were used to evaluate the molecular evolution of Vip proteins. As a result, strong evidence was found that Vip proteins are subject to the high rates of positive selection, and 16 sites are identified to be under positive selection using the Bayes Empirical Bayesian method. Interestingly, all these positively selected sites are located from site-705 to site-809 in the C-terminus of the Vip proteins. Most of these sites are exposed and clustered in the loop regions when mapped onto its computational predicted secondary tertiary and a part of the tertiary structure. It has been postulated that the high divergence in the C-terminal of Vip proteins may not result from the lack of functional constraints, but rather from the rapid mutation to adapt their targeted insects, driven by positive selection. The potential positive selection pressures may be an attempt to adapt for the "arm race" between Vip proteins and the targeted insects, or to enlarge their target's host range. Sites identified to be under positive selection may be related to the insect host range, which may shed a light on the investigation of the Vip proteins' structural and functional relationships.展开更多
A noise-reduction method with sliding called the local f-x Cadzow noise-reduction method, windows in the frequency-space (f-x) domain, is presented in this paper. This method is based on the assumption that the sign...A noise-reduction method with sliding called the local f-x Cadzow noise-reduction method, windows in the frequency-space (f-x) domain, is presented in this paper. This method is based on the assumption that the signal in each window is linearly predictable in the spatial direction while the random noise is not. For each Toeplitz matrix constructed by constant frequency slice, a singular value decomposition (SVD) is applied to separate signal from noise. To avoid edge artifacts caused by zero percent overlap between windows and to remove more noise, an appropriate overlap is adopted. Besides flat and dipping events, this method can enhance curved and conflicting events. However, it is not suitable for seismic data that contains big spikes or null traces. It is also compared with the SVD, f-x deconvolution, and Cadzow method without windows. The comparison results show that the local Cadzow method performs well in removing random noise and preserving signal. In addition, a real data example proves that it is a potential noise-reduction technique for seismic data obtained in areas of complex formations.展开更多
Sports matches are very popular all over the world.The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match.It's a challenging effort to...Sports matches are very popular all over the world.The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match.It's a challenging effort to predict a sports match.Therefore,a method is proposed to predict the result of the next match by using teams'historical match data.We combined the Long Short-Term Memory(LSTM)model with the attention mechanism and put forward an ASLSTM model for predicting match results.Furthermore,to ensure the timeliness of the prediction,we add the time sliding window to make the prediction have better timeliness.Taking the football match as an example,we carried out a case study and proposed the feasibility of this method.展开更多
The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named...The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.展开更多
An efficient observability analysis method is proposed to enable online detection of performance degradation of an optimization-based sliding window visual-inertial state estimation framework.The proposed methodology ...An efficient observability analysis method is proposed to enable online detection of performance degradation of an optimization-based sliding window visual-inertial state estimation framework.The proposed methodology leverages numerical techniques in nonlinear observability analysis to enable online evaluation of the system observability and indication of the state estimation performance.Specifically,an empirical observability Gramian based approach is introduced to efficiently measure the observability condition of the windowed nonlinear system,and a scalar index is proposed to quantify the average system observability.The proposed approach is specialized to a challenging optimizationbased sliding window monocular visual-inertial state estimation formulation and evaluated through simulation and experiments to assess the efficacy of the methodology.The analysis result shows that the proposed approach can correctly indicate degradation of the state estimation accuracy with real-time performance.展开更多
On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is e...On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy.展开更多
Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the prem...Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query,sharing,and analysis.The dynamic data release is a study with a wide range of current industry needs.However,the amount of data varies considerably over different periods.Unreasonable data processing will result in the risk of users’information leakage and unavailability of the data.Therefore,we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM(DPHP-DL),which can improve data availability on the premise of guaranteeing data privacy.DPHP-DL is integrated by DSW-LSTM and DPHK+.DSW-LSTM updates the size of sliding windows based on data value prediction via long shortterm memory(LSTM)networks,which evenly divides the data stream into several windows.DPHK+heuristically publishes non-isometric histograms based on k-mean++clustering of automatically obtaining the optimal K,so as to achieve differential privacy histogram publishing of dynamic data.Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL.展开更多
基金This research was funded by the National Key R&D Program of China(No.2018YFB2100603)the Key R&D Program of Hubei Province(No.2022BAA048)+2 种基金the National Natural Science Foundation of China program(No.41890822)the Open Fund of National Engineering Research Centre for Geographic Information System,China University of Geosciences,Wuhan 430074,China(No.2022KFJJ07)The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Centre of Wuhan University.
文摘To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s pH value is 6.0–8.5.The sewage treatment plant uses some data in the sewage treatment process to monitor and predict whether wastewater’s pH value will exceed the standard.This paper aims to study the deep learning prediction model of wastewater’s pH.Firstly,the research uses the random forest method to select the data features and then,based on the sliding window,convert the data set into a time series which is the input of the deep learning training model.Secondly,by analyzing and comparing relevant references,this paper believes that the CNN(Convolutional Neural Network)model is better at nonlinear data modeling and constructs a CNN model including the convolution and pooling layers.After alternating the combination of the convolutional layer and pooling layer,all features are integrated into a full-connected neural network.Thirdly,the number of input samples of the CNN model directly affects the prediction effect of the model.Therefore,this paper adopts the sliding window method to study the optimal size.Many experimental results show that the optimal prediction model can be obtained when alternating six convolutional layers and three pooling layers.The last full-connection layer contains two layers and 64 neurons per layer.The sliding window size selects as 12.Finally,the research has carried out data prediction based on the optimal CNN deep learning model.The predicted pH of the sewage is between 7.2 and 8.6 in this paper.The result is applied in the monitoring system platform of the“Intelligent operation and maintenance platform of the reclaimed water plant.”
基金Fundamental Research Funds for the Central Universities of Ministry of Education of China。
文摘Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the target historical fault-free reference data as the template which is similar to the current snapshot data.The size of sliding window is usually given according to empirical values,while the influence of different sizes of sliding windows on fault detection of an air conditioning system is not further studied.The air conditioning system is a dynamic response process,and the operating parameters change with the change of the load,while the response of the controller is delayed.In a variable air volume(VAV)air conditioning system controlled by the total air volume method,in order to ensure sufficient response time,30 data points are selected first,and then their multiples are selected.Three different sizes of sliding windows with 30,60 and 90 data points are applied to compare the fault detection effect in this paper.The results show that if the size of the sliding window is 60 data points,the average fault-free detection ratio is 80.17%in fault-free testing days,and the average fault detection ratio is 88.47%in faulty testing days.
基金Supported by the National Natural Science Foun-dation of China (60473073)
文摘Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.
基金supported by the National Natural Science Foundation of China under Grant Nos. 60973020, 60828004,and 60933001the Program for New Century Excellent Talents in University of China under Grant No. NCET-06-0290the Fundamental Research Funds for the Central Universities under Grant No. N090504004
文摘Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
基金supported by the "Hundred Talents Program" of CAS and the National Natural Science Foundation of China under Grant No. 60772034.
文摘Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.
基金This work is supported by EIAS(Emerging Intelligent Autonomous Systems)Data Science Lab,Prince Sultan University,Kingdom of Saudi Arabia,by paying the APC.
文摘The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.
基金Supported by the National Natural Science Foun-dation of China (60403027)
文摘This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.
基金Supported by the National Natural Science Foun-dation of China (60573089) the National 985 Project Fundation(985-2-DB-Y01)
文摘How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.
基金supported by the National Key Research and Development Program of China(2020SKA0110300)the Joint Research Fund in Astronomy(U1831204 and U1931141)under cooperative agreement between the National Natural Science Foundation of China(NSFC)+7 种基金the Chinese Academy of Sciences(CAS)(NSFC,No.11903009)the Funds for International Cooperation and Exchange of the NSFC(11961141001)Yunnan Key Research and Development Program(2018IA054)The Key Science and Technology Program of Henan Province(Nos.202102210152,212102210611 and 202102210125)the Research and Cultivation Fund Project of Anyang Normal University(AYNUKPY-2019-24 and AYNUKPY-2020-25)supported by Astronomical Big Data Joint Research Centerco-founded by the National Astronomical ObservatoriesChinese Academy of Sciences and Alibaba Cloud。
文摘Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent problems to be solved.Herein,we present a practical and robust file-level flow-control approach,called the Unlimited Sliding-Window(USW),by referring to the classic flow-control method in the TCP protocol.Based on the USW and the Next Generation Archive System(NGAS)developed for the Murchison Widefield Array telescope,we further implemented an enhanced archive system(ENGAS)using ZeroMQ middleware.The ENGAS substantially improves the transfer performance and ensures the integrity of transferred files.In the tests,the ENGAS is approximately three to twelve times faster than the NGAS and can fully utilize the bandwidth of network links.Thus,for archiving radio observation data,the ENGAS reduces the communication time,improves the bandwidth utilization,and solves the remote synchronous archiving of data from observatories such as Mingantu spectral radioheliograph.It also provides a better reference for the future construction of the Square Kilometer Array(SKA)Science Regional Center.
基金supported in part by the Scientific Research Project of the Education Department of Jilin Province (JJKH20221020KJ)the National Natural Science Foundation of China (51408257)the Graduate Innovation Fund of Jilin University (101832020CX150)。
文摘A lane-level intersection map is a cornerstone in high-definition(HD) traffic network maps for autonomous driving and high-precision intelligent transportation systems applications such as traffic management and control, and traffic accident evaluation and prevention. Mapping an HD intersection is time-consuming, labor-intensive, and expensive with conventional methods. In this paper, we used a low-channel roadside light detection and range sensor(LiDAR) to automatically and dynamically generate a lane-level intersection, including the signal phases, geometry, layout, and lane directions. First, a mathematical model was proposed to describe the topology and detail of a lane-level intersection. Second, continuous and discontinuous traffic object trajectories were extracted to identify the signal phases and times. Third, the layout, geometry, and lane direction were identified using the convex hull detection algorithm for trajectories. Fourth, a sliding window algorithm was presented to detect the lane marking and extract the lane, and the virtual lane connecting the inbound and outbound of the intersection were generated using the vehicle trajectories within the intersection and considering the traffic rules. In the field experiment, the mean absolute estimation error is 2 s for signal phase and time identification. The lane marking identification Precision and Recall are96% and 94.12%, respectively. Compared with the satellite-based,MMS-based, and crowdsourcing-based lane mapping methods,the average lane location deviation is 0.2 m and the update period is less than one hour by the proposed method with low-channel roadside LiDAR.
文摘For enhancing performances and increasing functions of PD radar, High PRF, medium PRF and low PRF are commonly applied into system ambiguity appeared in range and velocity in some PRF. Based on clustering, a sliding window correlator algorithm for resolving the radar object ambiguity in range and velocity is described. Slide window algorithm is a searching algorithm. The probability of ambiguity resolution for targets and the computational efficiency are discussed. The relations between the probability of ambiguity resolution of this algorithm and PRF, the range of interest, and the width of sliding window are analyzed. Simulational results are also given.
基金Funded by the Natural Science Foundation of China (No. 60873030)National High Technology Research and Development Program of China (No. 2007AA01Z309)Defense Pre-Research Foundation of China (No. 9140A04010209JW0504 and No. 9140A15040208JW0501)
文摘A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.
基金National Natural Science Foundation of China (No. 30571009).
文摘Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans-genes that will complement the novel δendotoxins in future. Fewer structural and functional relationships of Vip proteins are known in comparison with those of δ-endotoxins. In this study, both the maximum-likelihood methods and the maximum parsimony based sliding window analysis were used to evaluate the molecular evolution of Vip proteins. As a result, strong evidence was found that Vip proteins are subject to the high rates of positive selection, and 16 sites are identified to be under positive selection using the Bayes Empirical Bayesian method. Interestingly, all these positively selected sites are located from site-705 to site-809 in the C-terminus of the Vip proteins. Most of these sites are exposed and clustered in the loop regions when mapped onto its computational predicted secondary tertiary and a part of the tertiary structure. It has been postulated that the high divergence in the C-terminal of Vip proteins may not result from the lack of functional constraints, but rather from the rapid mutation to adapt their targeted insects, driven by positive selection. The potential positive selection pressures may be an attempt to adapt for the "arm race" between Vip proteins and the targeted insects, or to enlarge their target's host range. Sites identified to be under positive selection may be related to the insect host range, which may shed a light on the investigation of the Vip proteins' structural and functional relationships.
基金support from the National Key Basic Research Development Program(Grant No.2007CB209600)National Major Science and Technology Program(Grant No.2008ZX05010-002)
文摘A noise-reduction method with sliding called the local f-x Cadzow noise-reduction method, windows in the frequency-space (f-x) domain, is presented in this paper. This method is based on the assumption that the signal in each window is linearly predictable in the spatial direction while the random noise is not. For each Toeplitz matrix constructed by constant frequency slice, a singular value decomposition (SVD) is applied to separate signal from noise. To avoid edge artifacts caused by zero percent overlap between windows and to remove more noise, an appropriate overlap is adopted. Besides flat and dipping events, this method can enhance curved and conflicting events. However, it is not suitable for seismic data that contains big spikes or null traces. It is also compared with the SVD, f-x deconvolution, and Cadzow method without windows. The comparison results show that the local Cadzow method performs well in removing random noise and preserving signal. In addition, a real data example proves that it is a potential noise-reduction technique for seismic data obtained in areas of complex formations.
文摘Sports matches are very popular all over the world.The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match.It's a challenging effort to predict a sports match.Therefore,a method is proposed to predict the result of the next match by using teams'historical match data.We combined the Long Short-Term Memory(LSTM)model with the attention mechanism and put forward an ASLSTM model for predicting match results.Furthermore,to ensure the timeliness of the prediction,we add the time sliding window to make the prediction have better timeliness.Taking the football match as an example,we carried out a case study and proposed the feasibility of this method.
基金Supported by the National86 3High-Tech Project!(863-306-Z705-0 2 ) National Natural Science F oundation of China!(69896240)
文摘The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.
文摘An efficient observability analysis method is proposed to enable online detection of performance degradation of an optimization-based sliding window visual-inertial state estimation framework.The proposed methodology leverages numerical techniques in nonlinear observability analysis to enable online evaluation of the system observability and indication of the state estimation performance.Specifically,an empirical observability Gramian based approach is introduced to efficiently measure the observability condition of the windowed nonlinear system,and a scalar index is proposed to quantify the average system observability.The proposed approach is specialized to a challenging optimizationbased sliding window monocular visual-inertial state estimation formulation and evaluated through simulation and experiments to assess the efficacy of the methodology.The analysis result shows that the proposed approach can correctly indicate degradation of the state estimation accuracy with real-time performance.
基金supported by the National Key R&D Program of China(Nos.2018YFB1003905)the National Natural Science Foundation of China under Grant No.61971032,Fundamental Research Funds for the Central Universities(No.FRF-TP-18-008A3).
文摘On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy.
基金supported by the National Nature Science Foundation of China(Grant Nos.91546108,and 71490725)the AnhuiProvincial Scienceand Technology Major Projects(201903a05020020)+2 种基金the Anhui Provincial Natural Science Foundation(1908085QG298)the Fundamental Research Funds for the Central Universities(JZ2019HGTA0053,JZ2019 HGBZ0128)the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-making,Ministry of Education,China.
文摘Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query,sharing,and analysis.The dynamic data release is a study with a wide range of current industry needs.However,the amount of data varies considerably over different periods.Unreasonable data processing will result in the risk of users’information leakage and unavailability of the data.Therefore,we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM(DPHP-DL),which can improve data availability on the premise of guaranteeing data privacy.DPHP-DL is integrated by DSW-LSTM and DPHK+.DSW-LSTM updates the size of sliding windows based on data value prediction via long shortterm memory(LSTM)networks,which evenly divides the data stream into several windows.DPHK+heuristically publishes non-isometric histograms based on k-mean++clustering of automatically obtaining the optimal K,so as to achieve differential privacy histogram publishing of dynamic data.Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL.