期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Scaling Notebooks as Re-configurable Cloud Workflows
1
作者 Yuandou Wang Spiros Koulouzis +5 位作者 Riccardo Bianchi Na Li Yifang Shi Joris Timmermans W.Daniel Kissling Zhiming Zhao 《Data Intelligence》 EI 2022年第2期409-425,共17页
Literate computing environments,such as the Jupyter(i.e.,Jupyter Notebooks,JupyterLab,and JupyterHub),have been widely used in scientific studies;they allow users to interactively develop scientific code,test algorith... Literate computing environments,such as the Jupyter(i.e.,Jupyter Notebooks,JupyterLab,and JupyterHub),have been widely used in scientific studies;they allow users to interactively develop scientific code,test algorithms,and describe the scientific narratives of the experiments in an integrated document.To scale up scientific analyses,many implemented Jupyter environment architectures encapsulate the whole Jupyter notebooks as reproducible units and autoscale them on dedicated remote infrastructures(e.g.,highperformance computing and cloud computing environments).The existing solutions are stl limited in many ways,e.g.,1)the workflow(or pipeline)is implicit in a notebook,and some steps can be generically used by different code and executed in parallel,but because of the tight cell structure,all steps in the Jupyter notebook have to be executed sequentially and lack of the flexibility of reusing the core code fragments,and 2)there are performance bottlenecks that need to improve the parallelism and scalability when handling extensive input data and complex computation.In this work,we focus on how to manage the workflow in a notebook seamlessly.We 1)encapsulate the reusable cells as RESTful services and containerize them as portal components,2)provide a composition tool for describing workflow logic of those reusable components,and 3)automate the execution on remote cloud infrastructure.Empirically,we validate the solution's usability via a use case from the Ecology and Earth Science domain,illustrating the processing of massive Light Detection and Ranging(LiDAR)data.The demonstration and analysis show that our method is feasible,but that it needs further improvement,especially on integrating distributed workflow scheduling,automatic deployment,and execution to develop as a mature approach. 展开更多
关键词 Scientific experiments Jupyter Notebooks Workflow management Ecosystem structure data products CLOUD SCALABILITY
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部