期刊文献+
共找到2,678篇文章
< 1 2 134 >
每页显示 20 50 100
Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach
1
作者 Dayu Xu Jiaming Lu +1 位作者 Xuyao Zhang Hongtao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第2期2557-2573,共17页
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims... Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research. 展开更多
关键词 data stream clustering TEDA KD-TREE scapegoat tree
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
2
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Sentiment Drift Detection and Analysis in Real Time Twitter Data Streams
3
作者 E.Susi A.P.Shanthi 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3231-3246,共16页
Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.... Handling sentiment drifts in real time twitter data streams are a challen-ging task while performing sentiment classifications,because of the changes that occur in the sentiments of twitter users,with respect to time.The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time.This work proposes an adap-tive learning algorithm-based framework,Twitter Sentiment Drift Analysis-Bidir-ectional Encoder Representations from Transformers(TSDA-BERT),which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time.The framework also works on static data by converting them to data streams using the Kafka tool.The experiments conducted on real time and simulated tweets of sports,health care andfinancial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model,with accuracies of 91%,87%and 90%,respectively.Though the results have been provided only for a few topics,as a proof of concept,this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic. 展开更多
关键词 Sentiment drift sentiment classification big data BERT real time data streams TWITTER
下载PDF
Dynamically Computing Approximate Frequency Counts in Sliding Window over Data Stream 被引量:1
4
作者 NIE Guo-liang LU Zheng-ding 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期283-288,共6页
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru... This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective. 展开更多
关键词 data stream sliding window approximation algorithms frequency counts
下载PDF
Fast wireless sensor for anomaly detection based on data stream in an edge-computing-enabled smart greenhouse 被引量:1
5
作者 Yihong Yang Sheng Ding +4 位作者 Yuwen Liu Shunmei Meng Xiaoxiao Chi Rui Ma Chao Yan 《Digital Communications and Networks》 SCIE CSCD 2022年第4期498-507,共10页
Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute ... Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency. 展开更多
关键词 Anomaly detection data stream DLSHiForest Smart greenhouse Edge computing
下载PDF
A Granularity-Aware Parallel Aggregation Method for Data Streams
6
作者 WANG Yong-li XU Hong-bing XU Li-zhen QIAN Jiang-bo LIU Xue-jun 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期133-137,共5页
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and li... This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing. 展开更多
关键词 data streams parallel processing linear regression AGGREGATION data skew
下载PDF
Linked-Tree: An Aggregate Query Algorithm Based on Sliding Window over Data Stream
7
作者 YU Yaxin WANG Guoren +1 位作者 SU Dong ZHU Xinhua 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1114-1119,共6页
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg... How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm. 展开更多
关键词 data streams sliding window aggregate query area HOP
下载PDF
An Indexed Non-Equijoin Algorithm Based on Sliding Windows over Data Streams
8
作者 YU Ya-xin YANG Xing-hua YU Ge WU Shan-shan 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期294-298,共5页
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream... Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index. 展开更多
关键词 non-equijoin data stream sliding window red-black indexing tree
下载PDF
Incremental Learning Framework for Mining Big Data Stream
9
作者 Alaa Eisa Nora E.L-Rashidy +2 位作者 Mohammad Dahman Alshehri Hazem M.El-bakry Samir Abdelrazek 《Computers, Materials & Continua》 SCIE EI 2022年第5期2901-2921,共21页
At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these metho... At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data,it also supposes that all data are pre-classified.Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features.The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model.The proposed model is implemented in spark architecture.For each data stream,the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation.This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency.The results show that the proposed model outperforms stateof-the-art performance in terms of accuracy(0.861)precision(0.9328)and minimal MSE(0.0416). 展开更多
关键词 Ant lion optimization(ALO) big data stream generative adversarial network(GAN) incremental learning renyi entropy
下载PDF
Approach to Anomaly Detection in Microservice System with Multi-Source Data Streams
10
作者 ZHANG Qixun HAN Jing +2 位作者 CHENG Li ZHANG Baisheng GONG Zican 《ZTE Communications》 2022年第3期85-92,共8页
Microservices have become popular in enterprises because of their excellent scalability and timely update capabilities.However,while fine-grained modularity and service-orientation decrease the complexity of system de... Microservices have become popular in enterprises because of their excellent scalability and timely update capabilities.However,while fine-grained modularity and service-orientation decrease the complexity of system development,the complexity of system operation and maintenance has been greatly increased,on the contrary.Multiple types of system failures occur frequently,and it is hard to detect and diagnose failures in time.Furthermore,microservices are updated frequently.Existing anomaly detection models depend on offline training and cannot adapt to the frequent updates of microservices.This paper proposes an anomaly detection approach for microservice systems with multi-source data streams.This approach realizes online model construction and online anomaly detection,and is capable of self-updating and self-adapting.Experimental results show that this approach can correctly identify 78.85%of faults of different types. 展开更多
关键词 anomaly detection data stream microservice monitored indicator system log
下载PDF
Analytical Engineering for Data Stream
11
作者 Rogério Rossi Kechi Hirama 《Journal of Computer and Communications》 2022年第7期13-34,共22页
The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount o... The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data. 展开更多
关键词 Analytical Engineering Analytical Organization data stream Analytics stream Mining
下载PDF
THRFuzzy:Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams 被引量:1
12
作者 Jagannath E.Nalavade T.Senthil Murugan 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第8期1789-1800,共12页
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside... The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers. 展开更多
关键词 模糊分类器 数据流分析 粗糙集理论 数据挖掘技术 fuzzy方法 k-NN分类 分类模型 模糊C均值
下载PDF
SCMR:a semantic-based coherence micro-cluster recognition algorithm for hybrid web data stream 被引量:2
13
作者 王珉 Wang Yongbin Li Ying 《High Technology Letters》 EI CAS 2016年第2期224-232,共9页
Data aggregation from various web sources is very significant for web data analysis domain.In addition,the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregation.... Data aggregation from various web sources is very significant for web data analysis domain.In addition,the recognition of coherence micro cluster is one of the most interesting issues in the field of data aggregation.Until now,many algorithms have been proposed to work on this issue.However,the deficiency of these solutions is that they cannot recognize the micro-cluster data stream accurately.A semantic-based coherent micro-cluster recognition algorithm for hybrid web data stream is proposed.Firsdy,an objective function is proposed to recognize the coherence micro-cluster and then the coherence micro-cluster recognition algorithm for hybrid web data stream based on semantic is raised.Finally,the effectiveness and efficiency evaluation of the algorithm with extensive experiments is verified on real music data sets from Baidu inc.and Migu inc.The experimental results show that the proposed algorithm has better recall rate than the non-semantic micro cluster recognition algorithm and single source data flow micro cluster recognition algorithm. 展开更多
关键词 网络数据流 识别算法 Web 混合 聚类 语义 连贯性 数据分析
下载PDF
An Efficient Outlier Detection Approach on Weighted Data Stream Based on Minimal Rare Pattern Mining 被引量:1
14
作者 Saihua Cai Ruizhi Sun +2 位作者 Shangbo Hao Sicong Li Gang Yuan 《China Communications》 SCIE CSCD 2019年第10期83-99,共17页
The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional... The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining. 展开更多
关键词 OUTLIER detection WEIGHTED data stream MINIMAL WEIGHTED RARE pattern MINING deviation factors
下载PDF
A graph-based sliding window multi-join over data stream 被引量:1
15
作者 ZHANG Liang Byeong-Seob You +2 位作者 GE Jun-wei LIU Zhao-hong Hae-Young Bae 《重庆邮电大学学报(自然科学版)》 2007年第3期362-366,共5页
Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used fo... Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used for selecting the join sequence of many sliding windows, which is ineffectively. The graph-based approach is proposed to process the problem. The sliding window join model is introduced primarily. In this model vertex represent join operator and edge indicated the join relationship among sliding windows. Vertex weight and edge weight represent the cost of join and the reciprocity of join operators respectively. Then good query plan with minimal cost can be found in the model. Thus a complete join algorithm combining setting up model, finding optimal query plan and executing query plan is shown. Experiments show that the graph-based approach is feasible and can work better in above environment. 展开更多
关键词 数据流 查询优化 图论 可调整窗口
下载PDF
Big Data Stream Analytics for Near Real-Time Sentiment Analysis 被引量:1
16
作者 Otto K. M. Cheng Raymond Lau 《Journal of Computer and Communications》 2015年第5期189-195,共7页
In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedente... In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies. 展开更多
关键词 BIG data data stream ANALYTICS SENTIMENT Analysis ONLINE Review
下载PDF
Strategy for Data Stream Processing Based on Measurement Metadata: An Outpatient Monitoring Scenario 被引量:1
17
作者 Mario Diván Luis Olsina Silvia Gordillo 《Journal of Software Engineering and Applications》 2011年第12期653-665,共13页
In this work we discuss SDSPbMM, an integrated Strategy for Data Stream Processing based on Measurement Metadata, applied to an outpatient monitoring scenario. The measures associated to the attributes of the patient ... In this work we discuss SDSPbMM, an integrated Strategy for Data Stream Processing based on Measurement Metadata, applied to an outpatient monitoring scenario. The measures associated to the attributes of the patient (entity) under monitoring, come from heterogeneous data sources as data streams, together with metadata associated with the formal definition of a measurement and evaluation project. Such metadata supports the patient analysis and monitoring in a more consistent way, facilitating for instance: i) The early detection of problems typical of data such as missing values, outliers, among others;and ii) The risk anticipation by means of on-line classification models adapted to the patient. We also performed a simulation using a prototype developed for outpatient monitoring, in order to analyze empirically processing times and variable scalability, which shed light on the feasibility of applying the prototype to real situations. In addition, we analyze statistically the results of the simulation, in order to detect the components which incorporate more variability to the system. 展开更多
关键词 MEASUREMENT data stream Processing C-INCAMI STATISTICAL Analysis
下载PDF
Logistic Regression for Evolving Data Streams Classification
18
作者 尹志武 黄上腾 薛贵荣 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第2期197-203,共7页
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi... Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm. 展开更多
关键词 类别 后勤海退 数据流矿业 分类器
下载PDF
Load Shedding Strategy Based on Combined Feed-Forward Plus Feedback Control over Data Streams
19
作者 Donghong Han Yi Fang +3 位作者 Daqing Yi Yifei Zhang Xiang Tang Guoren Wang 《Journal of Beijing Institute of Technology》 EI CAS 2019年第3期437-446,共10页
In data stream management systems (DSMSs), how to maintain the quality of queries is a difficult problem because both the processing cost and data arrival rates are highly unpredictable. When the system is overloaded,... In data stream management systems (DSMSs), how to maintain the quality of queries is a difficult problem because both the processing cost and data arrival rates are highly unpredictable. When the system is overloaded, quality degrades significantly and thus load shedding becomes necessary. Unlike processing overloading in the general way which is only by a feedback control (FB) loop to obtain a good and stable performance over data streams, a feedback plus feed-forward control (FFC) strategy is introduced in DSMSs, which have a good quality of service (QoS) in the aspects of miss ratio and processing delay. In this paper, a quality adaptation framework is proposed, in which the control-theory-based techniques are leveraged to adjust the application behavior with the considerations of the current system status. Compared to previous solutions, the FFC strategy achieves a good quality with a waste of fewer resources. 展开更多
关键词 data stream management systems (DSMSs) load SHEDDING feedback CONTROL FEED-FORWARD CONTROL quality of service (QoS)
下载PDF
Finding Recently Frequent Items over Online Data Streams
20
作者 尹志武 黄上腾 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期53-56,共4页
In this paper, a new algorithm HCOUNT+ is proposed to find frequent items over data stream based on the HCOUNT algorithm. The new algorithm adopts aided measures to improve the precision of HCOUNT greatly. In addition... In this paper, a new algorithm HCOUNT+ is proposed to find frequent items over data stream based on the HCOUNT algorithm. The new algorithm adopts aided measures to improve the precision of HCOUNT greatly. In addition, HCOUNT+ is introduced to time critical applications and a novel sliding windows-based algorithm SL-HCOUNT+ is proposed to mine the most frequent items occurring recently. This algorithm uses limited memory (nB·(1+α)·eε·ln-M/lnρ(α<1) counters), requires constant processing time per packet (only (1+α)·ln·-M/lnρ(α<1) counters are updated), makes only one pass over the streaming data, and is shown to work well in the experimental results. 展开更多
关键词 计算机技术 网络 在线数据 计算方法
下载PDF
上一页 1 2 134 下一页 到第
使用帮助 返回顶部