Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has beco...Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication.Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation,we compare HEFT-P with three typical workflow scheduling policies.The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive展开更多
With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an e...With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy-aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O- intensive workloads.展开更多
This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges l...This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges linguists are currently facing.After highlighting the significance of authentic data on linguistic research,we argue that language is a complex adaptive system driven by humans.Then,from the perspective of philosophy of science,we introduce the research paradigm of quantitative linguistics through several cases.Finally,we discuss how China’s linguistic research will benefit from the data-intensive approach in terms of scientification and internationalization.展开更多
Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges...Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges and opportunities.In this paper we give an overview of the development of Digital Earth by summarizing research achievements and marking the milestones of Digital Earth’s development.Then,the opportunities and challenges that Big Earth Data faces are discussed.As a data-intensive scientific research approach,Big Earth Data provides a new vision and methodology to Earth sciences,and the paper identifies the advantages of Big Earth Data to scientific research,especially in knowledge discovery and global change research.We believe that Big Earth Data will advance and promote the development of Digital Earth.展开更多
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their per...With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.展开更多
With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applicati...With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.展开更多
Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space ...Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space and presents examples of state-of-the-art projects in each major branch of Big Earth Data research.Current issues within Big Earth Data research are highlighted and potential future solutions identified.展开更多
Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platfo...Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.展开更多
文摘Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication.Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation,we compare HEFT-P with three typical workflow scheduling policies.The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive
基金Supported by the National Natural Science Foundation of China under Grant Nos.60970038,61272148the Science and Technology Plan Project of Hunan Province of China under Grant No.2012GK3075the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No.13B015
文摘With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy-aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O- intensive workloads.
基金This paper is a phased achievement of“A Study on Quantitative Linguistics:Contemporary Chinese Language”(11&ZD188)a major project sponsored by the National Social Science Fund of China and implemented by Zhejiang University’s“Big Data+Language Laws and Cognition”innovation team under the auspices of the Fundamental Research Funds for the Central Universities。
文摘This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges linguists are currently facing.After highlighting the significance of authentic data on linguistic research,we argue that language is a complex adaptive system driven by humans.Then,from the perspective of philosophy of science,we introduce the research paradigm of quantitative linguistics through several cases.Finally,we discuss how China’s linguistic research will benefit from the data-intensive approach in terms of scientification and internationalization.
文摘Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges and opportunities.In this paper we give an overview of the development of Digital Earth by summarizing research achievements and marking the milestones of Digital Earth’s development.Then,the opportunities and challenges that Big Earth Data faces are discussed.As a data-intensive scientific research approach,Big Earth Data provides a new vision and methodology to Earth sciences,and the paper identifies the advantages of Big Earth Data to scientific research,especially in knowledge discovery and global change research.We believe that Big Earth Data will advance and promote the development of Digital Earth.
基金supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61872079,61572129,61602112,61502097,61702096,61320106007,61632008,and 61702097)+4 种基金the Natural Science Foundation of Jiangsu Province(Nos.BK20160695 and BK20170689)the Fundamental Research Funds for the Central Universities(No.2242018k1G019)the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9)partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.
基金This work was supported by the regional innovation cooperation between Sichuan and Guangxi Provinces(Grant No.2020YFQ0019)the National Natural Science Foundation of China(Grant No.32070671).
文摘With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.
文摘Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space and presents examples of state-of-the-art projects in each major branch of Big Earth Data research.Current issues within Big Earth Data research are highlighted and potential future solutions identified.
基金This work is in part supported by the Director,Office of Advanced Scientific Computing Research,Office of Science,of the U.S.Department of Energy under Contract No.DE-AC02-06CH11357in part supported by the Exascale Computing Project under Grant No.17-SC-20-SC+1 种基金a joint project of the U.S.Department of Energy's Office of Science and National Nuclear Security Administration,responsible for delivering a capable exascale ecosystem,including software,applications,and hardware technology,to support the nation's exascale computing imperativein part supported by the U.S.Department of Energy,Office of Science,Office of Advanced Scientific Computing Research,Scientific Discovery through Advanced Computing(SciDAC)program.
文摘Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.