期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
FPGA-Based Stream Processing for Frequent Itemset Mining with Incremental Multiple Hashes
1
作者 Kasho Yamamoto Masayuki Ikebe +1 位作者 Tetsuya Asai Masato Motomura 《Circuits and Systems》 2016年第10期3299-3309,共11页
With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time... With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version. 展开更多
关键词 Data Mining Frequent Itemset Mining FPGA stream processing
下载PDF
Continuous Metadata in Continuous Integration,Stream Processing and Enterprise DataOps
2
作者 Mark Underwood 《Data Intelligence》 EI 2023年第1期275-287,共13页
Implementations of metadata tend to favor centralized,static metadata.This depiction is at variance with the past decade of focus on big data,cloud native architectures and streaming platforms.Big data velocity can de... Implementations of metadata tend to favor centralized,static metadata.This depiction is at variance with the past decade of focus on big data,cloud native architectures and streaming platforms.Big data velocity can demand a correspondingly dynamic view of metadata.These trends,which include DevOps,CI/CD,DataOps and data fabric,are surveyed.Several specific cloud native tools are reviewed and weaknesses in their current metadata use are identified.Implementations are suggested which better exploit capabilities of streaming platform paradigms,in which metadata is continuously collected in dynamic contexts.Future cloud native software features are identified which could enable streamed metadata to power real time data fusion or fine tune automated reasoning through real time ontology updates. 展开更多
关键词 METADATA real time data stream processing
原文传递
Adaptive watermark generation mechanism based on time series prediction for stream processing
3
作者 Yang SONG Yunchun LI +3 位作者 Hailong YANG Jun XU Zerong LUAN Wei LI 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第6期59-73,共15页
The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such... The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such as network delay.The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness.Watermark is a special kind of data inserted into the data stream with a timestamp,which helps the framework to decide whether the data received is late and thus be discarded.Traditional watermark generation strategies are periodic;they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy.This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation.This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy.We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets.The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy. 展开更多
关键词 data stream processing WATERMARK time series based prediction dynamic adjustment
原文传递
A comprehensive study on fault tolerance in stream processing systems
4
作者 Xiaotong WANG Chunxi ZHANG +3 位作者 Junhua FANG Rong ZHANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第2期80-97,共18页
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data.Since stream processing systems(SPSs)usually require distributed de... Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data.Since stream processing systems(SPSs)usually require distributed deployment on clusters of servers in face of large-scale of data,it is especially common to meet with failures of processing nodes or communication networks,but should be handled seriously considering service quality.A failed system may produce wrong results or become unavailable,resulting in a decline in user experience or even significant financial loss.Hence,a large amount of fault tolerance approaches have been proposed for SPSs.These approaches often have their own priorities on specific performance concerns,e.g.,runtime overhead and recovery efficiency.Nevertheless,there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs,which will become an obstacle for the development of SPSs.Therefore,we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs.Furthermore,we propose an evaluation framework tailored for fault tolerance,demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs.Finally,we specify future research directions in this domain. 展开更多
关键词 fault tolerance performance evaluation stream processing
原文传递
A distributed real-time data prediction framework for large-scale time-series data using stream processing
5
作者 Kehe Wu Yayun Zhu +1 位作者 Quan Li Ziwei Wu 《International Journal of Intelligent Computing and Cybernetics》 EI 2017年第2期145-165,共21页
Purpose-The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources,e.g.,sensor networks,securities exchange,electric power secondary... Purpose-The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources,e.g.,sensor networks,securities exchange,electric power secondary system,etc.Concretely,the proposed framework should handle several difficult requirements including the management of gigantic data sources,the need for a fast self-adaptive algorithm,the relatively accurate prediction of multiple time series,and the real-time demand.Design/methodology/approach-First,the autoregressive integrated moving average-based prediction algorithm is introduced.Second,the processing framework is designed,which includes a time-series data storage model based on the HBase,and a real-time distributed prediction platform based on Storm.Then,the work principle of this platform is described.Finally,a proof-of-concept testbed is illustrated to verify the proposed framework.Findings-Several tests based on Power Grid monitoring data are provided for the proposed framework.The experimental results indicate that prediction data are basically consistent with actual data,processing efficiency is relatively high,and resources consumption is reasonable.Originality/value-This paper provides a distributed real-time data prediction framework for large-scale time-series data,which can exactly achieve the requirement of the effective management,prediction efficiency,accuracy,and high concurrency for massive data sources. 展开更多
关键词 PREDICTION REAL-TIME Autoregressive integrated moving average STORM stream processing Time series
原文传递
Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines
6
作者 张洲 金培权 +3 位作者 谢希科 王晓亮 刘睿诚 万寿红 《Journal of Computer Science & Technology》 SCIE EI 2024年第1期116-138,共23页
Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this pr... Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this problem.However,these approaches do not guarantee the Quality of Service(QoS)when the task deployment changes at runtime,because the task migrations caused by the change of task deployments will impose an exorbitant cost.We study one of the most popular DSPEs,Apache Storm,and find out that when a task needs to be migrated,Storm has to stop the resource(implemented as a process of Worker in Storm)where the task is deployed.This will lead to the stop and restart of all tasks in the resource,resulting in the poor performance of task migrations.Aiming to solve this problem,in this pa-per,we propose N-Storm(Nonstop Storm),which is a task-resource decoupling DSPE.N-Storm allows tasks allocated to resources to be changed at runtime,which is implemented by a thread-level scheme for task migrations.Particularly,we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan.Thus,each resource can manage its tasks at runtime.Based on N-Storm,we further propose Online Task Deployment(OTD).Differ-ing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migra-tions caused by a task re-deployment,OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources.We demonstrate that OTD can adapt to different kinds of applications including computation-and communication-intensive applications.The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87%of the performance degradation time,compared with Apache Storm and other state-of-the-art approaches.In addition,OTD can increase the average CPU usage by 51%for computation-intensive applications and reduce network communication costs by 88%for communication-intensive ap-plications. 展开更多
关键词 distributed stream processing engine(DSPE) Apache Storm online task migration online task deployment
原文传递
Runtime reconfiguration of data services for dealing with out-of-range stream fluctuation in cloud-edge environments
7
作者 Shouli Zhang Chen Liu +1 位作者 Xiaohong Li Yanbo Han 《Digital Communications and Networks》 SCIE CSCD 2022年第6期1014-1026,共13页
The integration of cloud and IoT edge devices is of significance in reducing the latency of IoT stream data processing by moving services closer to the edge-end.In this connection,a key issue is to determine when and ... The integration of cloud and IoT edge devices is of significance in reducing the latency of IoT stream data processing by moving services closer to the edge-end.In this connection,a key issue is to determine when and where services should be deployed.Common service deployment strategies used to be static based on the rules defined at the design time.However,dynamically changing IoT environments bring about unexpected situations such as out-of-range stream fluctuation,where the static service deployment solutions are not efficient.In this paper,we propose a dynamic service deployment mechanism based on the prediction of upcoming stream data.To effectively predict upcoming workloads,we combine the online machine learning methods with an online optimization algorithm for service deployment.A simulation-based evaluation demonstrates that,compared with those state-of-the art approaches,the approach proposed in this paper has a lower latency of stream processing. 展开更多
关键词 IoT stream processing Edge computing Out-of-Range stream fluctuation Dynamic service deployment
下载PDF
Applying Apache Spark on Streaming Big Data for Health Status Prediction
8
作者 Ahmed Ismail Ebada Ibrahim Elhenawy +3 位作者 Chang-Won Jeong Yunyoung Nam Hazem Elbakry Samir Abdelrazek 《Computers, Materials & Continua》 SCIE EI 2022年第2期3511-3527,共17页
Big data applications in healthcare have provided a variety of solutions to reduce costs,errors,and waste.This work aims to develop a real-time system based on big medical data processing in the cloud for the predicti... Big data applications in healthcare have provided a variety of solutions to reduce costs,errors,and waste.This work aims to develop a real-time system based on big medical data processing in the cloud for the prediction of health issues.In the proposed scalable system,medical parameters are sent to Apache Spark to extract attributes from data and apply the proposed machine learning algorithm.In this way,healthcare risks can be predicted and sent as alerts and recommendations to users and healthcare providers.The proposed work also aims to provide an effective recommendation system by using streaming medical data,historical data on a user’s profile,and a knowledge database to make themost appropriate real-time recommendations and alerts based on the sensor’s measurements.This proposed scalable system works by tweeting the health status attributes of users.Their cloud profile receives the streaming healthcare data in real time by extracting the health attributes via a machine learning prediction algorithm to predict the users’health status.Subsequently,their status can be sent on demand to healthcare providers.Therefore,machine learning algorithms can be applied to stream health care data from wearables and provide users with insights into their health status.These algorithms can help healthcare providers and individuals focus on health risks and health status changes and consequently improve the quality of life. 展开更多
关键词 Big data streaming processing healthcare data machine learning IoT data processing Apache Spark
下载PDF
Which metrics drive macroinvertebrate drift in neotropical sky island streams?
9
作者 Marcos Callisto Diego M.P.Castro +3 位作者 Marden S.Linares Laryssa K.Carvalho Jose E.L.Barbosa Robert M.Hughes 《Water Biology and Security》 2023年第1期1-8,共8页
Despite long-standing interest,the mechanisms driving aquatic macroinvertebrate drift in tropical streams remain poorly understood.Therefore,the objective of this study was to evaluate which environmental metrics driv... Despite long-standing interest,the mechanisms driving aquatic macroinvertebrate drift in tropical streams remain poorly understood.Therefore,the objective of this study was to evaluate which environmental metrics drive macroinvertebrate drift in neotropical sky island streams.We evaluated whether altitude,the abundance of food resources,and variations in water quality influenced macroinvertebrate drift density,diversity,richness,and functional feeding groups.An hypothesis was developed to test whether increased altitude,lower food availability(particulate organic matter),and discharge would increase the density,taxonomic richness,and diversity of drifting invertebrates.Nine headwater stream sites were sampled in the rainy and dry seasons in the Espinhaço Meridional Mountain Range(EMMR)of southeast Brazil.Samples were collected using drift nets deployed from 5:00 p.m.to 8:00 p.m.The abundance of food resources was assessed through estimates of coarse(CPOM)and fine(FPOM)particulate organic matter,and primary producers.CPOM availability was an important explanatory variable for Gathering-Collectors and Scrapers,Altitude was important for Shredders and Predators,and Filtering-Collectors were linked to water discharge,suggesting that functional group drift masses were linked to different ecosystem components.Water temperature,conductivity,dissolved oxygen,current velocity,FPOM biomass and microbasin elevation range exerted little influence on macroinvertebrate drift.Regarding taxa composition,this study also found that Baetidae and Leptohyphidae(Ephemeroptera)and Chironomidae and Simuliidae(Diptera)were the most abundant groups drifting. 展开更多
关键词 Biodiversity conservation stream processes Macroinvertebrate functional groups Water quality Serra do espinhaço
原文传递
Event detection and evolution in multi-lingual social streams
10
作者 Yaopeng Liu Hao Peng +2 位作者 Jianxin Li Yangqiu Song Xiong Li 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第5期213-227,共15页
Real-life events are emerging and evolving in social and news streams.Recent methods have succeeded in capturing designed features of monolingual events,but lack of interpretability and multi-lingual considerations.To... Real-life events are emerging and evolving in social and news streams.Recent methods have succeeded in capturing designed features of monolingual events,but lack of interpretability and multi-lingual considerations.To this end,we propose a multi-lingual event mining model,namely MLEM,to automatically detect events and generate evolution graph in multilingual hybrid-length text streams including English,Chinese,French,German,Russian and Japanese.Specially,we merge the same entities and similar phrases and present multiple similarity measures by incremental word2vec model.We propose an 8-tuple to describe event for correlation analysis and evolution graph generation.We evaluate the MLEM model using a massive human-generated dataset containing real world events.Experimental results show that our new model MLEM outperforms the baseline method both in efficiency and effectiveness. 展开更多
关键词 event detection event evolution stream processing multi-lingual anomaly detection
原文传递
Conversion of fish processing wastewater into fish feed ingredients through submerged cultivation of Aspergillus oryzae
11
作者 Taner Sar Jorge A.Ferreira Mohammad J.Taherzadeh 《Systems Microbiology and Biomanufacturing》 2021年第1期100-110,共11页
Fish processing towards production of fillet gives rise to wastewater streams that are ultimately directed to biogas production and/or wastewater treatment.However,these wastewater streams are rich in minerals,fat,and... Fish processing towards production of fillet gives rise to wastewater streams that are ultimately directed to biogas production and/or wastewater treatment.However,these wastewater streams are rich in minerals,fat,and proteins that can be converted to protein-rich feed ingredients through submerged cultivation of edible filamentous fungi.In this study,the origin of wastewater stream,initial pH,cultivation time,and extent of washing during sieving,were found to influence the amount of recovered material from the wastewater streams and its protein content,following cultivation with Aspergillus oryzae.Through culti-vation of the filamentous fungus in sludge,330 kg of material per ton of COD were recovered by sieving,corresponding to 121 kg protein per ton of COD,while through its cultivation in salt brine,210 kg of material were recovered per ton of COD,corresponding to 128 kg protein per ton of COD.Removal ranges of 12-43%,39-92%,and 32-66%for COD,total solids,and nitrogen,respectively,were obtained after A.oryzae growth and harvesting in the wastewater streams.Therefore,the present study shows the versatility that the integration of fungal cultivation provides to fish processing industries,and should be complemented by economic,environmental,and feeding studies,in order to reveal the most promising valorization strategy. 展开更多
关键词 Aspergillus oryzae Fish processing wastewater streams Protein sources Waste management
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部