期刊文献+
共找到26篇文章
< 1 2 >
每页显示 20 50 100
An Efficient Modelling of Oversampling with Optimal Deep Learning Enabled Anomaly Detection in Streaming Data
1
作者 R.Rajakumar S.Sathiya Devi 《China Communications》 SCIE CSCD 2024年第5期249-260,共12页
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL... Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets. 展开更多
关键词 anomaly detection deep learning hyperparameter optimization OVERSAMPLING SMOTE streaming data
下载PDF
A Novel Outlier Detection with Feature Selection Enabled Streaming Data Classification
2
作者 R.Rajakumar S.Sathiya Devi 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期2101-2116,共16页
Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approach... Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches. 展开更多
关键词 streaming data classification outlier removal feature selection machine learning metaheuristics
下载PDF
An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data
3
作者 Romany F.Mansour Shaha Al-Otaibi +3 位作者 Amal Al-Rasheed Hanan Aljuaid Irina V.Pustokhina Denis A.Pustokhin 《Computers, Materials & Continua》 SCIE EI 2021年第9期2843-2858,共16页
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl... Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively. 展开更多
关键词 streaming data concept drift classification model deep learning class imbalance data
下载PDF
Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach
4
作者 Dayu Xu Jiaming Lu +1 位作者 Xuyao Zhang Hongtao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第2期2557-2573,共17页
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims... Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research. 展开更多
关键词 data stream clustering TEDA KD-TREE scapegoat tree
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
5
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Sentiment Drift Detection and Analysis in Real Time Twitter Data Streams
6
作者 E.Susi A.P.Shanthi 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3231-3246,共16页
Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.... Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time.This work proposes an adap-tive learning algorithm-based framework,Twitter Sentiment Drift Analysis-Bidir-ectional Encoder Representations from Transformers(TSDA-BERT),which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time.The framework also works on static data by converting them to data streams using the Kafka tool.The experiments conducted on real time and simulated tweets of sports,health care andfinancial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model,with accuracies of 91%,87%and 90%,respectively.Though the results have been provided only for a few topics,as a proof of concept,this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic. 展开更多
关键词 Sentiment drift sentiment classification big data BERT real time data streams TWITTER
下载PDF
Fast wireless sensor for anomaly detection based on data stream in an edge-computing-enabled smart greenhouse 被引量:1
7
作者 Yihong Yang Sheng Ding +4 位作者 Yuwen Liu Shunmei Meng Xiaoxiao Chi Rui Ma Chao Yan 《Digital Communications and Networks》 SCIE CSCD 2022年第4期498-507,共10页
Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute ... Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency. 展开更多
关键词 Anomaly detection data stream DLSHiForest Smart greenhouse Edge computing
下载PDF
Impact of Distance Measures on the Performance of AIS Data Clustering
8
作者 Marta Mieczyńska Ireneusz Czarnowski 《Computer Systems Science & Engineering》 SCIE EI 2021年第1期69-82,共14页
Automatic Identification System(AIS)data stream analysis is based on the AIS data of different vessel’s behaviours,including the vessels’routes.When the AIS data consists of outliers,noises,or are incomplete,then th... Automatic Identification System(AIS)data stream analysis is based on the AIS data of different vessel’s behaviours,including the vessels’routes.When the AIS data consists of outliers,noises,or are incomplete,then the analysis of the vessel’s behaviours is not possible or is limited.When the data consists of outliers,it is not possible to automatically assign the AIS data to a particular vessel.In this paper,a clustering method is proposed to support the AIS data analysis,to qualify noises and outliers with respect to their suitability,and finally to aid the reconstruction of the vessel’s trajectory.In this paper,clustering results have been obtained using selected algorithms,including k-means,k-medoids,and fuzzy c-means.Based on the clustering results,it is possible to decide on the qualification of data with outliers and on their usefulness in the reconstruction of the vessel trajectory.The main aim of this paper is to answer how different distance measures during a clustering process can influence AIS data clustering quality.The main core question is whether or not they have an impact on the process of reconstruction of the vessel trajectories when the data are damaged.The research question during the computational experiments asked whether or not distance measure influence AIS data clustering quality.The computational experiments have been carried out using original AIS data.In general,the experiment and the results confirm the usefulness of the cluster-based analysis when the data include outliers that are derived from the natural environment.It is also possible to monitor and to analyse AIS data using clustering when the data include outliers.The computational experiment results confirm that the k-means with Euclidean distance has the best performance. 展开更多
关键词 AIS SAT-AIS AIS data stream CLUSTERING maritime data analysis
下载PDF
Incremental Learning Framework for Mining Big Data Stream
9
作者 Alaa Eisa Nora E.L-Rashidy +2 位作者 Mohammad Dahman Alshehri Hazem M.El-bakry Samir Abdelrazek 《Computers, Materials & Continua》 SCIE EI 2022年第5期2901-2921,共21页
At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these metho... At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data,it also supposes that all data are pre-classified.Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features.The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model.The proposed model is implemented in spark architecture.For each data stream,the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation.This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency.The results show that the proposed model outperforms stateof-the-art performance in terms of accuracy(0.861)precision(0.9328)and minimal MSE(0.0416). 展开更多
关键词 Ant lion optimization(ALO) big data stream generative adversarial network(GAN) incremental learning renyi entropy
下载PDF
Approach to Anomaly Detection in Microservice System with Multi-Source Data Streams
10
作者 ZHANG Qixun HAN Jing +2 位作者 CHENG Li ZHANG Baisheng GONG Zican 《ZTE Communications》 2022年第3期85-92,共8页
Microservices have become popular in enterprises because of their excellent scalability and timely update capabilities.However,while fine-grained modularity and service-orientation decrease the complexity of system de... Microservices have become popular in enterprises because of their excellent scalability and timely update capabilities.However,while fine-grained modularity and service-orientation decrease the complexity of system development,the complexity of system operation and maintenance has been greatly increased,on the contrary.Multiple types of system failures occur frequently,and it is hard to detect and diagnose failures in time.Furthermore,microservices are updated frequently.Existing anomaly detection models depend on offline training and cannot adapt to the frequent updates of microservices.This paper proposes an anomaly detection approach for microservice systems with multi-source data streams.This approach realizes online model construction and online anomaly detection,and is capable of self-updating and self-adapting.Experimental results show that this approach can correctly identify 78.85%of faults of different types. 展开更多
关键词 anomaly detection data stream microservice monitored indicator system log
下载PDF
Analytical Engineering for Data Stream
11
作者 Rogério Rossi Kechi Hirama 《Journal of Computer and Communications》 2022年第7期13-34,共22页
The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount o... The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data. 展开更多
关键词 Analytical Engineering Analytical Organization data Stream Analytics Stream Mining
下载PDF
Drift DetectionMethod Using DistanceMeasures and Windowing Schemes for Sentiment Classification
12
作者 Idris Rabiu Naomie Salim +3 位作者 Maged Nasser Aminu Da’u Taiseer Abdalla Elfadil Eisa Mhassen Elnour Elneel Dalam 《Computers, Materials & Continua》 SCIE EI 2023年第3期6001-6017,共17页
Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred t... Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method. 展开更多
关键词 data streams sentiment analysis concept drift ensemble classification adaptive window
下载PDF
Combined Effect of Concept Drift and Class Imbalance on Model Performance During Stream Classification
13
作者 Abdul Sattar Palli Jafreezal Jaafar +3 位作者 Manzoor Ahmed Hashmani Heitor Murilo Gomes Aeshah Alsughayyir Abdul Rehman Gilal 《Computers, Materials & Continua》 SCIE EI 2023年第4期1827-1845,共19页
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes over... Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance. 展开更多
关键词 CLASSIFICATION data streams class imbalance concept drift class imbalance ratio
下载PDF
Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning
14
作者 Jun Wang Maiwang Shi +4 位作者 Xiao Zhang Yan Li Yunsheng Yuan Chengei Yang Dongxiao Yu 《Big Data Mining and Analytics》 EI CSCD 2024年第1期87-106,共20页
With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application scenarios.Specifically,the issue of classifying data streams based on mobile sensors can be for... With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application scenarios.Specifically,the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors.Existing incremental learning methods are often single-task single-view,which cannot learn shared representations between relevant tasks and views.An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges,utilizing the idea of multi-task multi-view learning.Specifically,the attention mechanism is first used to align different sensor data of different views.In addition,MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning.Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines. 展开更多
关键词 data stream classification mobile sensors multi-task multi-view learning incremental learning
原文传递
Clustered Single-Board Devices with Docker Container Big Stream Processing Architecture
15
作者 N.Penchalaiah Abeer S.Al-Humaimeedy +3 位作者 Mashael Maashi J.Chinna Babu Osamah Ibrahim Khalaf Theyazn H.H.Aldhyani 《Computers, Materials & Continua》 SCIE EI 2022年第12期5349-5365,共17页
The expanding amounts of information created by Internet of Things(IoT)devices places a strain on cloud computing,which is often used for data analysis and storage.This paper investigates a different approach based on... The expanding amounts of information created by Internet of Things(IoT)devices places a strain on cloud computing,which is often used for data analysis and storage.This paper investigates a different approach based on edge cloud applications,which involves data filtering and processing before being delivered to a backup cloud environment.This Paper suggest designing and implementing a low cost,low power cluster of Single Board Computers(SBC)for this purpose,reducing the amount of data that must be transmitted elsewhere,using Big Data ideas and technology.An Apache Hadoop and Spark Cluster that was used to run a test application was containerized and deployed using a Raspberry Pi cluster and Docker.To obtain system data and analyze the setup’s performance a Prometheusbased stack monitoring and alerting solution in the cloud based market is employed.This Paper assesses the system’s complexity and demonstrates how containerization can improve fault tolerance and maintenance ease,allowing the suggested solution to be used in industry.An evaluation of the overall performance is presented to highlight the capabilities and limitations of the suggested architecture,taking into consideration the suggested solution’s resource use in respect to device restrictions. 展开更多
关键词 Big data edge cloud cluster architecture performance engineering Raspberry pi dockers warm container technology data streaming
下载PDF
Clustering feature decision trees for semi-supervised classification from high-speed data streams 被引量:4
16
作者 Wen-hua XU Zheng QIN Yang CHANG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第8期615-628,共14页
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this pa... Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed. 展开更多
关键词 Clustering feature vector Decision tree Semi-supervised learning Stream data classification Very fast decision tree
原文传递
An ensemble method for data stream classification in the presence of concept drift 被引量:3
17
作者 Omid ABBASZADEH Ali AMIRI Ali Reza KHANTEYMOORI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第12期1059-1068,共10页
One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are imm... One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space. 展开更多
关键词 data stream Classificaion Ensemble classifiers Concept drift
原文传递
An online anomaly detection method for stream data using isolation principle and statistic histogram
18
作者 Zhiguo Ding Minrui Fei Dajun Du 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2015年第2期85-106,共22页
Online anomaly detection for stream data has been explored recently,where the detector is supposed to be able to perform an accurate and timely judgment for the upcoming observation.However,due to the inherent complex... Online anomaly detection for stream data has been explored recently,where the detector is supposed to be able to perform an accurate and timely judgment for the upcoming observation.However,due to the inherent complex characteristics of stream data,such as quick generation,tremendous volume and dynamic evolution distribution,how to develop an effective online anomaly detection method is a challenge.The main objective of this paper is to propose an adaptive online anomaly detection method for stream data.This is achieved by combining isolation principle with online ensemble learning,which is then optimized by statistic histogram.Three main algorithms are developed,i.e.,online detector building algorithm,anomaly detecting algorithm and adaptive detector updating algorithm.To evaluate our proposed method,four massive datasets from the UCI machine learning repository recorded from real events were adopted.Extensive simulations based on these datasets show that our method is effective and robust against different scenarios. 展开更多
关键词 Online anomaly detection stream data isolation principle ensemble learning statistic histogram
原文传递
IDEA:A Utility-Enhanced Approach to Incomplete Data Stream Anonymization
19
作者 Lu Yang Xingshu Chen +2 位作者 Yonggang Luo Xiao Lan Wei Wang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期127-140,共14页
The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation meth... The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A few researches allow them to participate in data anonymization but introduce extra considerable information loss.To balance the utility and privacy preservation of incomplete data streams,we present a utility-enhanced approach for Incomplete Data strEam Anonymization(IDEA).In this approach,a slide-window-based processing framework is introduced to anonymize data streams continuously,in which each tuple can be output with clustering or anonymized clusters.We consider the dimensions of attribute and tuple as the similarity measurement,which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss.To avoid the missing value pollution,we propose a generalization method that is based on maybe match for generalizing incomplete data.The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility. 展开更多
关键词 ANONYMIZATION GENERALIZATION incomplete data streams privacy preservation UTILITY
原文传递
A domain-independent methodology to analyze IoT data streams in real-time.A proof of concept implementation for anomaly detection from environmental data
20
作者 Sergio Trilles Òscar Belmonte +1 位作者 Sven Schade Joaquìn Huerta 《International Journal of Digital Earth》 SCIE EI 2017年第1期103-120,共18页
Pushed by the Internet of Things(IoT)paradigm modern sensor networks monitor a wide range of phenomena,in areas such as environmental monitoring,health care,industrial processes,and smart cities.These networks provide... Pushed by the Internet of Things(IoT)paradigm modern sensor networks monitor a wide range of phenomena,in areas such as environmental monitoring,health care,industrial processes,and smart cities.These networks provide a continuous pulse of the almost infinite activities that are happening in the physical space and are thus,key enablers for a Digital Earth Nervous System.Nevertheless,the rapid processing of these sensor data streams still continues to challenge traditional data-handling solutions and new approaches are being requested.We propose a generic answer to this challenge,which has the potential to support any form of distributed real-time analysis.This neutral methodology follows a brokering approach to work with different kinds of data sources and uses web-based standards to achieve interoperability.As a proof of concept,we implemented the methodology to detect anomalies in real-time and applied it to the area of environmental monitoring.The developed system is capable of detecting anomalies,generating notifications,and displaying the recent situation to the user. 展开更多
关键词 Big data real-time analysis data streams sensor networks INTEROPERABILITY brokering approach
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部