期刊文献+
共找到70篇文章
< 1 2 4 >
每页显示 20 50 100
An Efficient Modelling of Oversampling with Optimal Deep Learning Enabled Anomaly Detection in Streaming Data
1
作者 R.Rajakumar S.Sathiya Devi 《China Communications》 SCIE CSCD 2024年第5期249-260,共12页
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL... Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets. 展开更多
关键词 anomaly detection deep learning hyperparameter optimization OVERSAMPLING SMOTE streaming data
下载PDF
A Novel Outlier Detection with Feature Selection Enabled Streaming Data Classification
2
作者 R.Rajakumar S.Sathiya Devi 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期2101-2116,共16页
Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approach... Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches. 展开更多
关键词 streaming data classification outlier removal feature selection machine learning metaheuristics
下载PDF
An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data 被引量:1
3
作者 Romany F.Mansour Shaha Al-Otaibi +3 位作者 Amal Al-Rasheed Hanan Aljuaid Irina V.Pustokhina Denis A.Pustokhin 《Computers, Materials & Continua》 SCIE EI 2021年第9期2843-2858,共16页
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl... Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively. 展开更多
关键词 streaming data concept drift classification model deep learning class imbalance data
下载PDF
Research and Simulation on the Small-scale Streaming Data Transmission Communication System based on ARM and FPGA
4
作者 Yuzhu Ren 《International Journal of Technology Management》 2016年第10期72-74,共3页
In this paper, we conduct theoretical research on small-scale streaming data transmission communication system based on ARM and FPGA. Compared with network layer IP multicast, it does not need to change the underlying... In this paper, we conduct theoretical research on small-scale streaming data transmission communication system based on ARM and FPGA. Compared with network layer IP multicast, it does not need to change the underlying structure of the network with the realization. Aa are day by day mature as a result of the embedded technical high speed development and the GPRS technology with use the embedded system essence small, special-purpose strong, the system simplification and the GPRS network cover characteristics and so on whole world. Under this basis, this paper proposes the new ARM and FPGA based small-scale streaming data transmission communication system. The implementation of the system proves its effectiveness. 展开更多
关键词 ARM and FPGA SMALL-SCALE streaming data Communication System.
下载PDF
Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach
5
作者 Dayu Xu Jiaming Lu +1 位作者 Xuyao Zhang Hongtao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第2期2557-2573,共17页
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims... Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research. 展开更多
关键词 data stream clustering TEDA KD-TREE scapegoat tree
下载PDF
How many probe vehicles are enough for identifying traffic congestion?--a study from a streaming data perspective 被引量:2
6
作者 Handong WANG Yang YUE Qingquan LI 《Frontiers of Earth Science》 SCIE CAS CSCD 2013年第1期34-42,共9页
Many studies have been carried out using vehicle trajectory to analyze traffic conditions, for instance, identifying traffic congestion. However, there is a lack of a systematic study on the appropriate number of prob... Many studies have been carried out using vehicle trajectory to analyze traffic conditions, for instance, identifying traffic congestion. However, there is a lack of a systematic study on the appropriate number of probe vehicles and their sampling interval in order to identify traffic congestion accurately. Moreover, most of related studies ignore the streaming feature of trajectory data. This paper first represents a novel method of identifying traffic congestion considering the stream feature of vehicle trajectories. Instead of processing the whole data stream, a series of snapshots are extracted. Congested road segments can be identified by analyzing the clusters' evolution among a series of adjacent snapshots. We then calculated a series of parameters and their corresponding congestion identification accuracy. The results have implications for related probe vehicle deployment and traffic analysis; for example, when 5% of probe vehicles are available, 85% identification accuracy can be reached if the sampling time interval is 10 s. 展开更多
关键词 vehicle streaming data traffic trajectory data floating car data CONGESTION
原文传递
Super point detection based on sampling and data streaming algorithms
7
作者 程光 强士卿 《Journal of Southeast University(English Edition)》 EI CAS 2009年第2期224-227,共4页
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and... In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption. 展开更多
关键词 super point flow sampling data streaming
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
8
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Sentiment Drift Detection and Analysis in Real Time Twitter Data Streams
9
作者 E.Susi A.P.Shanthi 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3231-3246,共16页
Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.... Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time.This work proposes an adap-tive learning algorithm-based framework,Twitter Sentiment Drift Analysis-Bidir-ectional Encoder Representations from Transformers(TSDA-BERT),which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time.The framework also works on static data by converting them to data streams using the Kafka tool.The experiments conducted on real time and simulated tweets of sports,health care andfinancial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model,with accuracies of 91%,87%and 90%,respectively.Though the results have been provided only for a few topics,as a proof of concept,this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic. 展开更多
关键词 Sentiment drift sentiment classification big data BERT real time data streams TWITTER
下载PDF
Clustering algorithm for multiple data streams based on spectral component similarity 被引量:1
10
作者 邹凌君 陈崚 屠莉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期264-266,共3页
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR... A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods. 展开更多
关键词 data streams CLUSTERING AR model spectral component
下载PDF
Data partitioning based on sampling for power load streams
11
作者 王永利 徐宏炳 +2 位作者 董逸生 钱江波 刘学军 《Journal of Southeast University(English Edition)》 EI CAS 2005年第3期293-298,共6页
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh... A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing. 展开更多
关键词 data streams continuous queries parallel processing sampling data partitioning
下载PDF
Min-wise hash function-based sampling over distributed data streams
12
作者 崇志宏 倪巍伟 +2 位作者 徐立臻 吕建华 谢英豪 《Journal of Southeast University(English Edition)》 EI CAS 2009年第4期456-459,共4页
In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data str... In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data streams is defined. First, for each family of min-wise hash functions, the data with the minimum hash value are selected as local samples and the biased effect caused by frequent updates in a single data stream is filtered out. Secondly, for the same hash function, the sample with the minimum hash value is selected as the global sample and the local samples are combined at the center node to filter out the biased effect of duplicated updates. Finally, based on the obtained uniform samples, several aggregations on the defined semantics of the union of data streams are precisely estimated. The results of comparison tests on synthetic and real-life data streams demonstrate the effectiveness of this method. 展开更多
关键词 data streams AGGREGATION rain-wise hashing
下载PDF
Fast wireless sensor for anomaly detection based on data stream in an edge-computing-enabled smart greenhouse 被引量:3
13
作者 Yihong Yang Sheng Ding +4 位作者 Yuwen Liu Shunmei Meng Xiaoxiao Chi Rui Ma Chao Yan 《Digital Communications and Networks》 SCIE CSCD 2022年第4期498-507,共10页
Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute ... Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency. 展开更多
关键词 Anomaly detection data stream DLSHiForest Smart greenhouse Edge computing
下载PDF
SCMR:a semantic-based coherence micro-cluster recognition algorithm for hybrid web data stream 被引量:2
14
作者 王珉 Wang Yongbin Li Ying 《High Technology Letters》 EI CAS 2016年第2期224-232,共9页
Data aggregation from various web sources is very significant for web data analysis domain. In ad- dition, the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregat... Data aggregation from various web sources is very significant for web data analysis domain. In ad- dition, the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregation. Until now, many algorithms have been proposed to work on this issue. However, the deficiency of these solutions is that they cannot recognize the micro-cluster data stream accurately. A semantic-based coherent micro-cluster recognition algorithm for hybrid web data stream is nronosed.Firstly, an objective function is proposed to recognize the coherence micro-cluster and then the coher- ence micro-cluster recognition algorithm for hybrid web data stream based on semantic is raised. Fi- 展开更多
关键词 hybrid web data stream coherence micro-clustering entity unified object coher-ence semantic computing
下载PDF
Dynamically Computing Approximate Frequency Counts in Sliding Window over Data Stream 被引量:1
15
作者 NIE Guo-liang LU Zheng-ding 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期283-288,共6页
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru... This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective. 展开更多
关键词 data stream sliding window approximation algorithms frequency counts
下载PDF
THRFuzzy:Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams 被引量:1
16
作者 Jagannath E.Nalavade T.Senthil Murugan 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第8期1789-1800,共12页
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside... The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers. 展开更多
关键词 data stream classification fuzzy rough set tangential holoentropy concept change
下载PDF
Linked-Tree: An Aggregate Query Algorithm Based on Sliding Window over Data Stream
17
作者 YU Yaxin WANG Guoren +1 位作者 SU Dong ZHU Xinhua 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1114-1119,共6页
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg... How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm. 展开更多
关键词 data streams sliding window aggregate query area HOP
下载PDF
A Granularity-Aware Parallel Aggregation Method for Data Streams
18
作者 WANG Yong-li XU Hong-bing XU Li-zhen QIAN Jiang-bo LIU Xue-jun 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期133-137,共5页
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and li... This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing. 展开更多
关键词 data streams parallel processing linear regression AGGREGATION data skew
下载PDF
An Indexed Non-Equijoin Algorithm Based on Sliding Windows over Data Streams
19
作者 YU Ya-xin YANG Xing-hua YU Ge WU Shan-shan 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期294-298,共5页
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream... Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index. 展开更多
关键词 non-equijoin data stream sliding window red-black indexing tree
下载PDF
Logistic Regression for Evolving Data Streams Classification
20
作者 尹志武 黄上腾 薛贵荣 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第2期197-203,共7页
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi... Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm. 展开更多
关键词 CLASSIFICATION logistic regression data stream mining
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部