期刊文献+
共找到48篇文章
< 1 2 3 >
每页显示 20 50 100
Prediction of the Wastewater’s pH Based on Deep Learning Incorporating Sliding Windows
1
作者 Aiping Xu Xuan Zou Chao Wang 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1043-1059,共17页
To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s... To protect the environment,the discharged sewage’s quality must meet the state’s discharge standards.There are many water quality indicators,and the pH(Potential of Hydrogen)value is one of them.The natural water’s pH value is 6.0–8.5.The sewage treatment plant uses some data in the sewage treatment process to monitor and predict whether wastewater’s pH value will exceed the standard.This paper aims to study the deep learning prediction model of wastewater’s pH.Firstly,the research uses the random forest method to select the data features and then,based on the sliding window,convert the data set into a time series which is the input of the deep learning training model.Secondly,by analyzing and comparing relevant references,this paper believes that the CNN(Convolutional Neural Network)model is better at nonlinear data modeling and constructs a CNN model including the convolution and pooling layers.After alternating the combination of the convolutional layer and pooling layer,all features are integrated into a full-connected neural network.Thirdly,the number of input samples of the CNN model directly affects the prediction effect of the model.Therefore,this paper adopts the sliding window method to study the optimal size.Many experimental results show that the optimal prediction model can be obtained when alternating six convolutional layers and three pooling layers.The last full-connection layer contains two layers and 64 neurons per layer.The sliding window size selects as 12.Finally,the research has carried out data prediction based on the optimal CNN deep learning model.The predicted pH of the sewage is between 7.2 and 8.6 in this paper.The result is applied in the monitoring system platform of the“Intelligent operation and maintenance platform of the reclaimed water plant.” 展开更多
关键词 Deep learning wastewater’s pH convolution neural network(CNN) PREDICTION sliding window
下载PDF
Influence of Three Sizes of Sliding Windows on Principle Component Analysis Fault Detection of Air Conditioning Systems 被引量:1
2
作者 YANG Xuebin MA Yanyun +2 位作者 HE Ruru WANG Ji LUO Wenjun 《Journal of Donghua University(English Edition)》 CAS 2022年第1期72-78,共7页
Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the ta... Principal component analysis(PCA)has been already employed for fault detection of air conditioning systems.The sliding window,which is composed of some parameters satisfying with thermal load balance,can select the target historical fault-free reference data as the template which is similar to the current snapshot data.The size of sliding window is usually given according to empirical values,while the influence of different sizes of sliding windows on fault detection of an air conditioning system is not further studied.The air conditioning system is a dynamic response process,and the operating parameters change with the change of the load,while the response of the controller is delayed.In a variable air volume(VAV)air conditioning system controlled by the total air volume method,in order to ensure sufficient response time,30 data points are selected first,and then their multiples are selected.Three different sizes of sliding windows with 30,60 and 90 data points are applied to compare the fault detection effect in this paper.The results show that if the size of the sliding window is 60 data points,the average fault-free detection ratio is 80.17%in fault-free testing days,and the average fault detection ratio is 88.47%in faulty testing days. 展开更多
关键词 sliding window principal component analysis(PCA) fault detection sensitivity analysis air conditioning system
下载PDF
An Indexed Non-Equijoin Algorithm Based on Sliding Windows over Data Streams
3
作者 YU Ya-xin YANG Xing-hua YU Ge WU Shan-shan 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期294-298,共5页
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream... Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index. 展开更多
关键词 non-equijoin data stream sliding window red-black indexing tree
下载PDF
Outlier Detection over Sliding Windows for Probabilistic Data Streams 被引量:4
4
作者 王斌 杨晓春 +1 位作者 王国仁 于戈 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第3期389-400,共12页
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ... Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach. 展开更多
关键词 outlier detection uncertain data probabilistic data stream sliding window
原文传递
Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows 被引量:3
5
作者 沈鸿 张育 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第6期973-987,共15页
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a... Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results. 展开更多
关键词 data stream duplicate detection bloom filter approximate query sliding window
原文传递
Classification and Comprehension of Software Requirements Using Ensemble Learning
6
作者 Jalil Abbas Arshad Ahmad +4 位作者 Syed Muqsit Shaheed Rubia Fatima Sajid Shah Mohammad Elaffendi Gauhar Ali 《Computers, Materials & Continua》 SCIE EI 2024年第8期2839-2855,共17页
The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re... The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes. 展开更多
关键词 Ensemble learning machine learning non-functional requirements requirement engineering accuracy sliding window
下载PDF
Dynamically Computing Approximate Frequency Counts in Sliding Window over Data Stream 被引量:1
7
作者 NIE Guo-liang LU Zheng-ding 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期283-288,共6页
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru... This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective. 展开更多
关键词 data stream sliding window approximation algorithms frequency counts
下载PDF
Linked-Tree: An Aggregate Query Algorithm Based on Sliding Window over Data Stream
8
作者 YU Yaxin WANG Guoren +1 位作者 SU Dong ZHU Xinhua 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1114-1119,共6页
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg... How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm. 展开更多
关键词 data streams sliding window aggregate query area HOP
下载PDF
Enhanced remote astronomical archive system based on the file-level Unlimited Sliding-Window technique
9
作者 Cong-Ming Shi Hui Deng +6 位作者 Feng Wang Ying Mei Shao-Guang Guo Chen Yang Chen Wu Shou-Lin Wei Andreas Wicenec 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2021年第10期119-126,共8页
Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent ... Data archiving is one of the most critical issues for modern astronomical observations.With the development of a new generation of radio telescopes,the transfer and archiving of massive remote data have become urgent problems to be solved.Herein,we present a practical and robust file-level flow-control approach,called the Unlimited Sliding-Window(USW),by referring to the classic flow-control method in the TCP protocol.Based on the USW and the Next Generation Archive System(NGAS)developed for the Murchison Widefield Array telescope,we further implemented an enhanced archive system(ENGAS)using ZeroMQ middleware.The ENGAS substantially improves the transfer performance and ensures the integrity of transferred files.In the tests,the ENGAS is approximately three to twelve times faster than the NGAS and can fully utilize the bandwidth of network links.Thus,for archiving radio observation data,the ENGAS reduces the communication time,improves the bandwidth utilization,and solves the remote synchronous archiving of data from observatories such as Mingantu spectral radioheliograph.It also provides a better reference for the future construction of the Square Kilometer Array(SKA)Science Regional Center. 展开更多
关键词 remote data archive NGAS sliding window
下载PDF
Automatic Lane-Level Intersection Map Generation using Low-Channel Roadside LiDAR
10
作者 Hui Liu Ciyun Lin +1 位作者 Bowen Gong Dayong Wu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第5期1209-1222,共14页
A lane-level intersection map is a cornerstone in high-definition(HD) traffic network maps for autonomous driving and high-precision intelligent transportation systems applications such as traffic management and contr... A lane-level intersection map is a cornerstone in high-definition(HD) traffic network maps for autonomous driving and high-precision intelligent transportation systems applications such as traffic management and control, and traffic accident evaluation and prevention. Mapping an HD intersection is time-consuming, labor-intensive, and expensive with conventional methods. In this paper, we used a low-channel roadside light detection and range sensor(LiDAR) to automatically and dynamically generate a lane-level intersection, including the signal phases, geometry, layout, and lane directions. First, a mathematical model was proposed to describe the topology and detail of a lane-level intersection. Second, continuous and discontinuous traffic object trajectories were extracted to identify the signal phases and times. Third, the layout, geometry, and lane direction were identified using the convex hull detection algorithm for trajectories. Fourth, a sliding window algorithm was presented to detect the lane marking and extract the lane, and the virtual lane connecting the inbound and outbound of the intersection were generated using the vehicle trajectories within the intersection and considering the traffic rules. In the field experiment, the mean absolute estimation error is 2 s for signal phase and time identification. The lane marking identification Precision and Recall are96% and 94.12%, respectively. Compared with the satellite-based,MMS-based, and crowdsourcing-based lane mapping methods,the average lane location deviation is 0.2 m and the update period is less than one hour by the proposed method with low-channel roadside LiDAR. 展开更多
关键词 High-definition map lane-level intersection map roadside LiDAR sliding window traffic object trajectory
下载PDF
ALGORITHM FOR AMBIGUITY RESOLUTION IN RANGE AND VELOCITY 被引量:1
11
作者 张弓 朱兆达 吕波 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2001年第2期219-223,共5页
For enhancing performances and increasing functions of PD radar, High PRF, medium PRF and low PRF are commonly applied into system ambiguity appeared in range and velocity in some PRF. Based on clustering, a slidin... For enhancing performances and increasing functions of PD radar, High PRF, medium PRF and low PRF are commonly applied into system ambiguity appeared in range and velocity in some PRF. Based on clustering, a sliding window correlator algorithm for resolving the radar object ambiguity in range and velocity is described. Slide window algorithm is a searching algorithm. The probability of ambiguity resolution for targets and the computational efficiency are discussed. The relations between the probability of ambiguity resolution of this algorithm and PRF, the range of interest, and the width of sliding window are analyzed. Simulational results are also given. 展开更多
关键词 PD radar ambiguity resolution sliding window CLUSTERING
下载PDF
Discrete intensity levels值对宫颈癌调强放疗计划的影响 被引量:1
12
作者 吴翠娥 《中国实用医药》 2020年第20期83-85,共3页
目的本研究主要探讨在宫颈癌调强放疗计划中,基于Xio放疗计划系统,动态调强方式(sliding window)子野优化参数Discrete intensity levels对子野权重优化(SWO)过程的影响。方法10例宫颈癌患者,在sliding window子野优化过程中,改变Discre... 目的本研究主要探讨在宫颈癌调强放疗计划中,基于Xio放疗计划系统,动态调强方式(sliding window)子野优化参数Discrete intensity levels对子野权重优化(SWO)过程的影响。方法10例宫颈癌患者,在sliding window子野优化过程中,改变Discrete intensity levels参数,数值可以选取10、9、8、7四个值。在满足相同的靶区剂量要求下[95%的计划靶区(PTV)满足50 Gy的剂量],比较四组level值下的子野数目、机器跳数、危及器官。结果四组level值下的危及器官受量比较差异均无统计学意义(P>0.05)。level值为7的子野数目为(59.2±0.9)个,与level值为10、9、8的(66.4±7.9)、(61.2±2.5)、(58.1±1.2)个比较差异均有统计学意义(P<0.05);level值为10、9、8的子野数目两两比较差异均无统计学意义(P>0.05)。四组level值下的机器跳数比较差异均无统计学意义(P>0.05)。结论参数Discrete intensity levels为7时能够满足临床剂量学要求,同时能有效减少治疗时间,可作为宫颈癌调强放疗计划sliding window方式的默认优化参数。 展开更多
关键词 宫颈癌:调强放疗 子野优化 sliding window Discrete intensity levels
下载PDF
Storage optimization for query processing over data streams
13
作者 唐向红 《Journal of Chongqing University》 CAS 2010年第2期79-92,共14页
A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patte... A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases. 展开更多
关键词 calendar queue ladder queue query processing sliding windows
下载PDF
Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
14
作者 吴金雨 赵方庆 +3 位作者 白洁 邓刚 秦松 包其郁 《Journal of Genetics and Genomics》 SCIE CAS CSCD 北大核心 2007年第7期649-660,共12页
Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans... Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans-genes that will complement the novel δendotoxins in future. Fewer structural and functional relationships of Vip proteins are known in comparison with those of δ-endotoxins. In this study, both the maximum-likelihood methods and the maximum parsimony based sliding window analysis were used to evaluate the molecular evolution of Vip proteins. As a result, strong evidence was found that Vip proteins are subject to the high rates of positive selection, and 16 sites are identified to be under positive selection using the Bayes Empirical Bayesian method. Interestingly, all these positively selected sites are located from site-705 to site-809 in the C-terminus of the Vip proteins. Most of these sites are exposed and clustered in the loop regions when mapped onto its computational predicted secondary tertiary and a part of the tertiary structure. It has been postulated that the high divergence in the C-terminal of Vip proteins may not result from the lack of functional constraints, but rather from the rapid mutation to adapt their targeted insects, driven by positive selection. The potential positive selection pressures may be an attempt to adapt for the "arm race" between Vip proteins and the targeted insects, or to enlarge their target's host range. Sites identified to be under positive selection may be related to the insect host range, which may shed a light on the investigation of the Vip proteins' structural and functional relationships. 展开更多
关键词 Bacillus thuringiensis positive selection sliding window maximum likelihood Vip proteins
下载PDF
A local f-x Cadzow method for noise reduction of seismic data obtained in complex formations 被引量:7
15
作者 Yuan Sanyi Wang Shangxu 《Petroleum Science》 SCIE CAS CSCD 2011年第3期269-277,共9页
A noise-reduction method with sliding called the local f-x Cadzow noise-reduction method, windows in the frequency-space (f-x) domain, is presented in this paper. This method is based on the assumption that the sign... A noise-reduction method with sliding called the local f-x Cadzow noise-reduction method, windows in the frequency-space (f-x) domain, is presented in this paper. This method is based on the assumption that the signal in each window is linearly predictable in the spatial direction while the random noise is not. For each Toeplitz matrix constructed by constant frequency slice, a singular value decomposition (SVD) is applied to separate signal from noise. To avoid edge artifacts caused by zero percent overlap between windows and to remove more noise, an appropriate overlap is adopted. Besides flat and dipping events, this method can enhance curved and conflicting events. However, it is not suitable for seismic data that contains big spikes or null traces. It is also compared with the SVD, f-x deconvolution, and Cadzow method without windows. The comparison results show that the local Cadzow method performs well in removing random noise and preserving signal. In addition, a real data example proves that it is a potential noise-reduction technique for seismic data obtained in areas of complex formations. 展开更多
关键词 Cadzow sliding window noise reduction FIDELITY complex formations Toeplitz matrix singular value decomposition
下载PDF
Sports match prediction model for training and exercise using attention-based LSTM network 被引量:1
16
作者 Qiyun Zhang Xuyun Zhang +3 位作者 Hongsheng Hu Caizhong Li Yinping Lin Rui Ma 《Digital Communications and Networks》 SCIE CSCD 2022年第4期508-515,共8页
Sports matches are very popular all over the world.The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match.It's a challenging effort to... Sports matches are very popular all over the world.The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match.It's a challenging effort to predict a sports match.Therefore,a method is proposed to predict the result of the next match by using teams'historical match data.We combined the Long Short-Term Memory(LSTM)model with the attention mechanism and put forward an ASLSTM model for predicting match results.Furthermore,to ensure the timeliness of the prediction,we add the time sliding window to make the prediction have better timeliness.Taking the football match as an example,we carried out a case study and proposed the feasibility of this method. 展开更多
关键词 SPORTS Prediction Long short-term memory ATTENTION sliding window
下载PDF
Knowledge Discovery from Communication Network Alarm Databases 被引量:1
17
作者 Wang Xin-miao Huang Tian-xi +1 位作者 Yan Pu-liu Chong Yan-wen 《Wuhan University Journal of Natural Sciences》 EI CAS 2000年第2期194-198,共5页
The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named... The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly. 展开更多
关键词 KDD alarm databases sliding window algorithm episode rules
下载PDF
Online Detection of State Estimator Performance Degradation via Efficient Numerical Observability Analysis 被引量:1
18
作者 Zheng Rong Shun'an Zhong Nathan Michael 《Journal of Beijing Institute of Technology》 EI CAS 2017年第2期259-266,共8页
An efficient observability analysis method is proposed to enable online detection of performance degradation of an optimization-based sliding window visual-inertial state estimation framework.The proposed methodology ... An efficient observability analysis method is proposed to enable online detection of performance degradation of an optimization-based sliding window visual-inertial state estimation framework.The proposed methodology leverages numerical techniques in nonlinear observability analysis to enable online evaluation of the system observability and indication of the state estimation performance.Specifically,an empirical observability Gramian based approach is introduced to efficiently measure the observability condition of the windowed nonlinear system,and a scalar index is proposed to quantify the average system observability.The proposed approach is specialized to a challenging optimizationbased sliding window monocular visual-inertial state estimation formulation and evaluated through simulation and experiments to assess the efficacy of the methodology.The analysis result shows that the proposed approach can correctly indicate degradation of the state estimation accuracy with real-time performance. 展开更多
关键词 observability analysis monocular visual-inertial state estimation sliding window non-linear optimization
下载PDF
Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment 被引量:1
19
作者 Qianqian Li Meng Li +1 位作者 Lei Guo Zhen Zhang 《Journal of Information Hiding and Privacy Protection》 2020年第4期199-205,共7页
On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is e... On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy. 展开更多
关键词 On-site programming big data duplicate record detection random forests adaptive sliding window
下载PDF
Differential privacy histogram publishing method based on dynamic sliding window 被引量:1
20
作者 Qian CHEN Zhiwei NI +1 位作者 Xuhui ZHU Pingfan XIA 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第4期209-220,共12页
Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the prem... Differential privacy has recently become a widely recognized strict privacy protection model of data release.Differential privacy histogram publishing can directly show the statistical data distribution under the premise of ensuring user privacy for data query,sharing,and analysis.The dynamic data release is a study with a wide range of current industry needs.However,the amount of data varies considerably over different periods.Unreasonable data processing will result in the risk of users’information leakage and unavailability of the data.Therefore,we designed a differential privacy histogram publishing method based on the dynamic sliding window of LSTM(DPHP-DL),which can improve data availability on the premise of guaranteeing data privacy.DPHP-DL is integrated by DSW-LSTM and DPHK+.DSW-LSTM updates the size of sliding windows based on data value prediction via long shortterm memory(LSTM)networks,which evenly divides the data stream into several windows.DPHK+heuristically publishes non-isometric histograms based on k-mean++clustering of automatically obtaining the optimal K,so as to achieve differential privacy histogram publishing of dynamic data.Extensive experiments on real-world dynamic datasets demonstrate the superior performance of the DPHP-DL. 展开更多
关键词 differential privacy dynamic data histogram publishing sliding window
原文传递
上一页 1 2 3 下一页 到第
使用帮助 返回顶部