With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time...With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.展开更多
Implementations of metadata tend to favor centralized,static metadata.This depiction is at variance with the past decade of focus on big data,cloud native architectures and streaming platforms.Big data velocity can de...Implementations of metadata tend to favor centralized,static metadata.This depiction is at variance with the past decade of focus on big data,cloud native architectures and streaming platforms.Big data velocity can demand a correspondingly dynamic view of metadata.These trends,which include DevOps,CI/CD,DataOps and data fabric,are surveyed.Several specific cloud native tools are reviewed and weaknesses in their current metadata use are identified.Implementations are suggested which better exploit capabilities of streaming platform paradigms,in which metadata is continuously collected in dynamic contexts.Future cloud native software features are identified which could enable streamed metadata to power real time data fusion or fine tune automated reasoning through real time ontology updates.展开更多
The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such...The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such as network delay.The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness.Watermark is a special kind of data inserted into the data stream with a timestamp,which helps the framework to decide whether the data received is late and thus be discarded.Traditional watermark generation strategies are periodic;they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy.This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation.This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy.We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets.The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.展开更多
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data.Since stream processing systems(SPSs)usually require distributed de...Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data.Since stream processing systems(SPSs)usually require distributed deployment on clusters of servers in face of large-scale of data,it is especially common to meet with failures of processing nodes or communication networks,but should be handled seriously considering service quality.A failed system may produce wrong results or become unavailable,resulting in a decline in user experience or even significant financial loss.Hence,a large amount of fault tolerance approaches have been proposed for SPSs.These approaches often have their own priorities on specific performance concerns,e.g.,runtime overhead and recovery efficiency.Nevertheless,there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs,which will become an obstacle for the development of SPSs.Therefore,we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs.Furthermore,we propose an evaluation framework tailored for fault tolerance,demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs.Finally,we specify future research directions in this domain.展开更多
Purpose-The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources,e.g.,sensor networks,securities exchange,electric power secondary...Purpose-The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources,e.g.,sensor networks,securities exchange,electric power secondary system,etc.Concretely,the proposed framework should handle several difficult requirements including the management of gigantic data sources,the need for a fast self-adaptive algorithm,the relatively accurate prediction of multiple time series,and the real-time demand.Design/methodology/approach-First,the autoregressive integrated moving average-based prediction algorithm is introduced.Second,the processing framework is designed,which includes a time-series data storage model based on the HBase,and a real-time distributed prediction platform based on Storm.Then,the work principle of this platform is described.Finally,a proof-of-concept testbed is illustrated to verify the proposed framework.Findings-Several tests based on Power Grid monitoring data are provided for the proposed framework.The experimental results indicate that prediction data are basically consistent with actual data,processing efficiency is relatively high,and resources consumption is reasonable.Originality/value-This paper provides a distributed real-time data prediction framework for large-scale time-series data,which can exactly achieve the requirement of the effective management,prediction efficiency,accuracy,and high concurrency for massive data sources.展开更多
Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this pr...Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this problem.However,these approaches do not guarantee the Quality of Service(QoS)when the task deployment changes at runtime,because the task migrations caused by the change of task deployments will impose an exorbitant cost.We study one of the most popular DSPEs,Apache Storm,and find out that when a task needs to be migrated,Storm has to stop the resource(implemented as a process of Worker in Storm)where the task is deployed.This will lead to the stop and restart of all tasks in the resource,resulting in the poor performance of task migrations.Aiming to solve this problem,in this pa-per,we propose N-Storm(Nonstop Storm),which is a task-resource decoupling DSPE.N-Storm allows tasks allocated to resources to be changed at runtime,which is implemented by a thread-level scheme for task migrations.Particularly,we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan.Thus,each resource can manage its tasks at runtime.Based on N-Storm,we further propose Online Task Deployment(OTD).Differ-ing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migra-tions caused by a task re-deployment,OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources.We demonstrate that OTD can adapt to different kinds of applications including computation-and communication-intensive applications.The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87%of the performance degradation time,compared with Apache Storm and other state-of-the-art approaches.In addition,OTD can increase the average CPU usage by 51%for computation-intensive applications and reduce network communication costs by 88%for communication-intensive ap-plications.展开更多
The integration of cloud and IoT edge devices is of significance in reducing the latency of IoT stream data processing by moving services closer to the edge-end.In this connection,a key issue is to determine when and ...The integration of cloud and IoT edge devices is of significance in reducing the latency of IoT stream data processing by moving services closer to the edge-end.In this connection,a key issue is to determine when and where services should be deployed.Common service deployment strategies used to be static based on the rules defined at the design time.However,dynamically changing IoT environments bring about unexpected situations such as out-of-range stream fluctuation,where the static service deployment solutions are not efficient.In this paper,we propose a dynamic service deployment mechanism based on the prediction of upcoming stream data.To effectively predict upcoming workloads,we combine the online machine learning methods with an online optimization algorithm for service deployment.A simulation-based evaluation demonstrates that,compared with those state-of-the art approaches,the approach proposed in this paper has a lower latency of stream processing.展开更多
Big data applications in healthcare have provided a variety of solutions to reduce costs,errors,and waste.This work aims to develop a real-time system based on big medical data processing in the cloud for the predicti...Big data applications in healthcare have provided a variety of solutions to reduce costs,errors,and waste.This work aims to develop a real-time system based on big medical data processing in the cloud for the prediction of health issues.In the proposed scalable system,medical parameters are sent to Apache Spark to extract attributes from data and apply the proposed machine learning algorithm.In this way,healthcare risks can be predicted and sent as alerts and recommendations to users and healthcare providers.The proposed work also aims to provide an effective recommendation system by using streaming medical data,historical data on a user’s profile,and a knowledge database to make themost appropriate real-time recommendations and alerts based on the sensor’s measurements.This proposed scalable system works by tweeting the health status attributes of users.Their cloud profile receives the streaming healthcare data in real time by extracting the health attributes via a machine learning prediction algorithm to predict the users’health status.Subsequently,their status can be sent on demand to healthcare providers.Therefore,machine learning algorithms can be applied to stream health care data from wearables and provide users with insights into their health status.These algorithms can help healthcare providers and individuals focus on health risks and health status changes and consequently improve the quality of life.展开更多
Despite long-standing interest,the mechanisms driving aquatic macroinvertebrate drift in tropical streams remain poorly understood.Therefore,the objective of this study was to evaluate which environmental metrics driv...Despite long-standing interest,the mechanisms driving aquatic macroinvertebrate drift in tropical streams remain poorly understood.Therefore,the objective of this study was to evaluate which environmental metrics drive macroinvertebrate drift in neotropical sky island streams.We evaluated whether altitude,the abundance of food resources,and variations in water quality influenced macroinvertebrate drift density,diversity,richness,and functional feeding groups.An hypothesis was developed to test whether increased altitude,lower food availability(particulate organic matter),and discharge would increase the density,taxonomic richness,and diversity of drifting invertebrates.Nine headwater stream sites were sampled in the rainy and dry seasons in the Espinhaço Meridional Mountain Range(EMMR)of southeast Brazil.Samples were collected using drift nets deployed from 5:00 p.m.to 8:00 p.m.The abundance of food resources was assessed through estimates of coarse(CPOM)and fine(FPOM)particulate organic matter,and primary producers.CPOM availability was an important explanatory variable for Gathering-Collectors and Scrapers,Altitude was important for Shredders and Predators,and Filtering-Collectors were linked to water discharge,suggesting that functional group drift masses were linked to different ecosystem components.Water temperature,conductivity,dissolved oxygen,current velocity,FPOM biomass and microbasin elevation range exerted little influence on macroinvertebrate drift.Regarding taxa composition,this study also found that Baetidae and Leptohyphidae(Ephemeroptera)and Chironomidae and Simuliidae(Diptera)were the most abundant groups drifting.展开更多
Real-life events are emerging and evolving in social and news streams.Recent methods have succeeded in capturing designed features of monolingual events,but lack of interpretability and multi-lingual considerations.To...Real-life events are emerging and evolving in social and news streams.Recent methods have succeeded in capturing designed features of monolingual events,but lack of interpretability and multi-lingual considerations.To this end,we propose a multi-lingual event mining model,namely MLEM,to automatically detect events and generate evolution graph in multilingual hybrid-length text streams including English,Chinese,French,German,Russian and Japanese.Specially,we merge the same entities and similar phrases and present multiple similarity measures by incremental word2vec model.We propose an 8-tuple to describe event for correlation analysis and evolution graph generation.We evaluate the MLEM model using a massive human-generated dataset containing real world events.Experimental results show that our new model MLEM outperforms the baseline method both in efficiency and effectiveness.展开更多
Fish processing towards production of fillet gives rise to wastewater streams that are ultimately directed to biogas production and/or wastewater treatment.However,these wastewater streams are rich in minerals,fat,and...Fish processing towards production of fillet gives rise to wastewater streams that are ultimately directed to biogas production and/or wastewater treatment.However,these wastewater streams are rich in minerals,fat,and proteins that can be converted to protein-rich feed ingredients through submerged cultivation of edible filamentous fungi.In this study,the origin of wastewater stream,initial pH,cultivation time,and extent of washing during sieving,were found to influence the amount of recovered material from the wastewater streams and its protein content,following cultivation with Aspergillus oryzae.Through culti-vation of the filamentous fungus in sludge,330 kg of material per ton of COD were recovered by sieving,corresponding to 121 kg protein per ton of COD,while through its cultivation in salt brine,210 kg of material were recovered per ton of COD,corresponding to 128 kg protein per ton of COD.Removal ranges of 12-43%,39-92%,and 32-66%for COD,total solids,and nitrogen,respectively,were obtained after A.oryzae growth and harvesting in the wastewater streams.Therefore,the present study shows the versatility that the integration of fungal cultivation provides to fish processing industries,and should be complemented by economic,environmental,and feeding studies,in order to reveal the most promising valorization strategy.展开更多
文摘With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.
文摘Implementations of metadata tend to favor centralized,static metadata.This depiction is at variance with the past decade of focus on big data,cloud native architectures and streaming platforms.Big data velocity can demand a correspondingly dynamic view of metadata.These trends,which include DevOps,CI/CD,DataOps and data fabric,are surveyed.Several specific cloud native tools are reviewed and weaknesses in their current metadata use are identified.Implementations are suggested which better exploit capabilities of streaming platform paradigms,in which metadata is continuously collected in dynamic contexts.Future cloud native software features are identified which could enable streamed metadata to power real time data fusion or fine tune automated reasoning through real time ontology updates.
基金This work was supported by National Key Research and Development Program of China(2020YFB1506703)the National Natural Science Foundation of China(Grant No.62072018).
文摘The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time.In reality,streaming data usually arrives out-of-order due to factors such as network delay.The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness.Watermark is a special kind of data inserted into the data stream with a timestamp,which helps the framework to decide whether the data received is late and thus be discarded.Traditional watermark generation strategies are periodic;they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy.This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation.This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy.We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets.The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.
基金The work was supported by the National Key Research and Development Plan Project(2018YFB1003404)。
文摘Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data.Since stream processing systems(SPSs)usually require distributed deployment on clusters of servers in face of large-scale of data,it is especially common to meet with failures of processing nodes or communication networks,but should be handled seriously considering service quality.A failed system may produce wrong results or become unavailable,resulting in a decline in user experience or even significant financial loss.Hence,a large amount of fault tolerance approaches have been proposed for SPSs.These approaches often have their own priorities on specific performance concerns,e.g.,runtime overhead and recovery efficiency.Nevertheless,there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs,which will become an obstacle for the development of SPSs.Therefore,we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs.Furthermore,we propose an evaluation framework tailored for fault tolerance,demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs.Finally,we specify future research directions in this domain.
基金supported by“the Fundamental Research Funds for the Central Universities(2015XS72).”。
文摘Purpose-The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources,e.g.,sensor networks,securities exchange,electric power secondary system,etc.Concretely,the proposed framework should handle several difficult requirements including the management of gigantic data sources,the need for a fast self-adaptive algorithm,the relatively accurate prediction of multiple time series,and the real-time demand.Design/methodology/approach-First,the autoregressive integrated moving average-based prediction algorithm is introduced.Second,the processing framework is designed,which includes a time-series data storage model based on the HBase,and a real-time distributed prediction platform based on Storm.Then,the work principle of this platform is described.Finally,a proof-of-concept testbed is illustrated to verify the proposed framework.Findings-Several tests based on Power Grid monitoring data are provided for the proposed framework.The experimental results indicate that prediction data are basically consistent with actual data,processing efficiency is relatively high,and resources consumption is reasonable.Originality/value-This paper provides a distributed real-time data prediction framework for large-scale time-series data,which can exactly achieve the requirement of the effective management,prediction efficiency,accuracy,and high concurrency for massive data sources.
基金The work was supported by the National Natural Science Foundation of China under Grant Nos.62072419 and 61672479.
文摘Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this problem.However,these approaches do not guarantee the Quality of Service(QoS)when the task deployment changes at runtime,because the task migrations caused by the change of task deployments will impose an exorbitant cost.We study one of the most popular DSPEs,Apache Storm,and find out that when a task needs to be migrated,Storm has to stop the resource(implemented as a process of Worker in Storm)where the task is deployed.This will lead to the stop and restart of all tasks in the resource,resulting in the poor performance of task migrations.Aiming to solve this problem,in this pa-per,we propose N-Storm(Nonstop Storm),which is a task-resource decoupling DSPE.N-Storm allows tasks allocated to resources to be changed at runtime,which is implemented by a thread-level scheme for task migrations.Particularly,we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan.Thus,each resource can manage its tasks at runtime.Based on N-Storm,we further propose Online Task Deployment(OTD).Differ-ing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migra-tions caused by a task re-deployment,OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources.We demonstrate that OTD can adapt to different kinds of applications including computation-and communication-intensive applications.The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87%of the performance degradation time,compared with Apache Storm and other state-of-the-art approaches.In addition,OTD can increase the average CPU usage by 51%for computation-intensive applications and reduce network communication costs by 88%for communication-intensive ap-plications.
基金supported by the General Program of National Natural Science Fouddation of China:Analytical Method Reserach of Loop and Recursion(No.61872262/F020106)the Key Project of the National Natural Science Foundation of China:Research on Big Service Theory and Methods in Big Data Environment(No.61832004).
文摘The integration of cloud and IoT edge devices is of significance in reducing the latency of IoT stream data processing by moving services closer to the edge-end.In this connection,a key issue is to determine when and where services should be deployed.Common service deployment strategies used to be static based on the rules defined at the design time.However,dynamically changing IoT environments bring about unexpected situations such as out-of-range stream fluctuation,where the static service deployment solutions are not efficient.In this paper,we propose a dynamic service deployment mechanism based on the prediction of upcoming stream data.To effectively predict upcoming workloads,we combine the online machine learning methods with an online optimization algorithm for service deployment.A simulation-based evaluation demonstrates that,compared with those state-of-the art approaches,the approach proposed in this paper has a lower latency of stream processing.
基金This study was financially supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),the Ministry of Health and Welfare(HI18C1216),and the Soonchunhyang University Research Fund.
文摘Big data applications in healthcare have provided a variety of solutions to reduce costs,errors,and waste.This work aims to develop a real-time system based on big medical data processing in the cloud for the prediction of health issues.In the proposed scalable system,medical parameters are sent to Apache Spark to extract attributes from data and apply the proposed machine learning algorithm.In this way,healthcare risks can be predicted and sent as alerts and recommendations to users and healthcare providers.The proposed work also aims to provide an effective recommendation system by using streaming medical data,historical data on a user’s profile,and a knowledge database to make themost appropriate real-time recommendations and alerts based on the sensor’s measurements.This proposed scalable system works by tweeting the health status attributes of users.Their cloud profile receives the streaming healthcare data in real time by extracting the health attributes via a machine learning prediction algorithm to predict the users’health status.Subsequently,their status can be sent on demand to healthcare providers.Therefore,machine learning algorithms can be applied to stream health care data from wearables and provide users with insights into their health status.These algorithms can help healthcare providers and individuals focus on health risks and health status changes and consequently improve the quality of life.
基金supported by Coordenaç~ao de Aperfeiçoamento de Pessoal de Nível Superior(CAPES)–Finance Code 001.MC was awarded Conselho Nacional de Desenvolvimento Científico e Tecnologico(CNPq)research productivity grant 304,060/2020-8 and Fundaç~ao de Amparoa Pesquisa de Minas Gerais(FAPEMIG)research grant PPM 00104-18.DMPC received a postdoctoral scholarship from P&D Aneel-Cemig GT-611.MSL received a postdoctoral scholarship from P&D Aneel-Cemig GT-599.RMH received a Fulbright Brasil grant.This work was partially supported by the CNPq for funding the Long-Term Ecological Research“PELD Campos Rupestres da Serra do Cipo”(grant number No.442694/2020-2).The authors have no financial or proprietary interests in any material discussed in this article.The authors are grateful to the colleagues of the Laboratorio de Ecologia de Bentos(ICB-UFMG)for field and laboratory assistance.
文摘Despite long-standing interest,the mechanisms driving aquatic macroinvertebrate drift in tropical streams remain poorly understood.Therefore,the objective of this study was to evaluate which environmental metrics drive macroinvertebrate drift in neotropical sky island streams.We evaluated whether altitude,the abundance of food resources,and variations in water quality influenced macroinvertebrate drift density,diversity,richness,and functional feeding groups.An hypothesis was developed to test whether increased altitude,lower food availability(particulate organic matter),and discharge would increase the density,taxonomic richness,and diversity of drifting invertebrates.Nine headwater stream sites were sampled in the rainy and dry seasons in the Espinhaço Meridional Mountain Range(EMMR)of southeast Brazil.Samples were collected using drift nets deployed from 5:00 p.m.to 8:00 p.m.The abundance of food resources was assessed through estimates of coarse(CPOM)and fine(FPOM)particulate organic matter,and primary producers.CPOM availability was an important explanatory variable for Gathering-Collectors and Scrapers,Altitude was important for Shredders and Predators,and Filtering-Collectors were linked to water discharge,suggesting that functional group drift masses were linked to different ecosystem components.Water temperature,conductivity,dissolved oxygen,current velocity,FPOM biomass and microbasin elevation range exerted little influence on macroinvertebrate drift.Regarding taxa composition,this study also found that Baetidae and Leptohyphidae(Ephemeroptera)and Chironomidae and Simuliidae(Diptera)were the most abundant groups drifting.
基金This work was supported by NSFC program(Grant Nos.61872022,61421003,U1636123)SKLSDE-2018ZX-16 and partly by the Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘Real-life events are emerging and evolving in social and news streams.Recent methods have succeeded in capturing designed features of monolingual events,but lack of interpretability and multi-lingual considerations.To this end,we propose a multi-lingual event mining model,namely MLEM,to automatically detect events and generate evolution graph in multilingual hybrid-length text streams including English,Chinese,French,German,Russian and Japanese.Specially,we merge the same entities and similar phrases and present multiple similarity measures by incremental word2vec model.We propose an 8-tuple to describe event for correlation analysis and evolution graph generation.We evaluate the MLEM model using a massive human-generated dataset containing real world events.Experimental results show that our new model MLEM outperforms the baseline method both in efficiency and effectiveness.
基金This work was supported by the Swedish Agency for Economic and Regional Growth(Tillväxtverket)through a European Regional Development Fund.
文摘Fish processing towards production of fillet gives rise to wastewater streams that are ultimately directed to biogas production and/or wastewater treatment.However,these wastewater streams are rich in minerals,fat,and proteins that can be converted to protein-rich feed ingredients through submerged cultivation of edible filamentous fungi.In this study,the origin of wastewater stream,initial pH,cultivation time,and extent of washing during sieving,were found to influence the amount of recovered material from the wastewater streams and its protein content,following cultivation with Aspergillus oryzae.Through culti-vation of the filamentous fungus in sludge,330 kg of material per ton of COD were recovered by sieving,corresponding to 121 kg protein per ton of COD,while through its cultivation in salt brine,210 kg of material were recovered per ton of COD,corresponding to 128 kg protein per ton of COD.Removal ranges of 12-43%,39-92%,and 32-66%for COD,total solids,and nitrogen,respectively,were obtained after A.oryzae growth and harvesting in the wastewater streams.Therefore,the present study shows the versatility that the integration of fungal cultivation provides to fish processing industries,and should be complemented by economic,environmental,and feeding studies,in order to reveal the most promising valorization strategy.