With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependenc...With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependencies, resulting in the inflexibility of the design and implement for the processes. This paper proposes a novel data-aware business process model which is able to describe both explicit control flow and implicit data flow. Data model with dependencies which are formulated by Linear-time Temporal Logic(LTL) is presented, and their satisfiability is validated by an automaton-based model checking algorithm. Data dependencies are fully considered in modeling phase, which helps to improve the efficiency and reliability of programming during developing phase. Finally, a prototype system based on j BPM for data-aware workflow is designed using such model, and has been deployed to Beijing Kingfore heating management system to validate the flexibility, efficacy and convenience of our approach for massive coding and large-scale system management in reality.展开更多
Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has beco...Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication.Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation,we compare HEFT-P with three typical workflow scheduling policies.The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive展开更多
With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an e...With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy-aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O- intensive workloads.展开更多
This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges l...This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges linguists are currently facing.After highlighting the significance of authentic data on linguistic research,we argue that language is a complex adaptive system driven by humans.Then,from the perspective of philosophy of science,we introduce the research paradigm of quantitative linguistics through several cases.Finally,we discuss how China’s linguistic research will benefit from the data-intensive approach in terms of scientification and internationalization.展开更多
Big data is a strategic highland in the era of knowledge-driven economies, and it is also a new type of strategic resource for all nations. Big data collected from space for Earth observation—so-called Big Earth Data...Big data is a strategic highland in the era of knowledge-driven economies, and it is also a new type of strategic resource for all nations. Big data collected from space for Earth observation—so-called Big Earth Data—is creating new opportunities for the Earth sciences and revolutionizing the innovation of methodologies and thought patterns. It has potential to advance in-depth development of Earth sciences and bring more exciting scientific discoveries.The Academic Divisions of the Chinese Academy of Sciences Forum on Frontiers of Science and Technology for Big Earth Data from Space was held in Beijing in June of 2015.The forum analyzed the development of Earth observation technology and big data, explored the concepts and scientific connotations of Big Earth Data from space, discussed the correlation between Big Earth Data and Digital Earth, and dissected the potential of Big Earth Data from space to promote scientific discovery in the Earth sciences, especially concerning global changes.展开更多
Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges...Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges and opportunities.In this paper we give an overview of the development of Digital Earth by summarizing research achievements and marking the milestones of Digital Earth’s development.Then,the opportunities and challenges that Big Earth Data faces are discussed.As a data-intensive scientific research approach,Big Earth Data provides a new vision and methodology to Earth sciences,and the paper identifies the advantages of Big Earth Data to scientific research,especially in knowledge discovery and global change research.We believe that Big Earth Data will advance and promote the development of Digital Earth.展开更多
Increasingly powerful computational technology has caused enormous data growth both in size and complexity. A key issue is how to organize the data to adapt the challenges of data analysis. This paper borrows ideas fr...Increasingly powerful computational technology has caused enormous data growth both in size and complexity. A key issue is how to organize the data to adapt the challenges of data analysis. This paper borrows ideas from the Internet of things (IOT) into the digital world and organize the data entities to form a network, the Internet of data (IOD), which has huge potential in data-intensive applications. In the IOD, data hiding technology is utilized to embed virtual tags, which record all the activities of the data entities since they are created, into every data entity in the system. The IOD aims to organize the data to be interconnected as a network and collect useful information for data identification, data tracing, data vitalization and further data analysis.展开更多
With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applicati...With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.展开更多
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their per...With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.展开更多
Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space ...Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space and presents examples of state-of-the-art projects in each major branch of Big Earth Data research.Current issues within Big Earth Data research are highlighted and potential future solutions identified.展开更多
Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platfo...Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.展开更多
基金supported by the National Natural Science Foundation of China (No. 61502043, No. 61132001)Beijing Natural Science Foundation (No. 4162042)BeiJing Talents Fund (No. 2015000020124G082)
文摘With the growing popularity of data-intensive services on the Internet, the traditional process-centric model for business process meets challenges due to the lack of abilities to describe data semantics and dependencies, resulting in the inflexibility of the design and implement for the processes. This paper proposes a novel data-aware business process model which is able to describe both explicit control flow and implicit data flow. Data model with dependencies which are formulated by Linear-time Temporal Logic(LTL) is presented, and their satisfiability is validated by an automaton-based model checking algorithm. Data dependencies are fully considered in modeling phase, which helps to improve the efficiency and reliability of programming during developing phase. Finally, a prototype system based on j BPM for data-aware workflow is designed using such model, and has been deployed to Beijing Kingfore heating management system to validate the flexibility, efficacy and convenience of our approach for massive coding and large-scale system management in reality.
文摘Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication.Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation,we compare HEFT-P with three typical workflow scheduling policies.The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive
基金Supported by the National Natural Science Foundation of China under Grant Nos.60970038,61272148the Science and Technology Plan Project of Hunan Province of China under Grant No.2012GK3075the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No.13B015
文摘With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy-aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O- intensive workloads.
基金This paper is a phased achievement of“A Study on Quantitative Linguistics:Contemporary Chinese Language”(11&ZD188)a major project sponsored by the National Social Science Fund of China and implemented by Zhejiang University’s“Big Data+Language Laws and Cognition”innovation team under the auspices of the Fundamental Research Funds for the Central Universities。
文摘This paper presents methodology and trends of linguistic research in the era of big data.We begin with a discussion of the role of linguists in the information society and illustrate the opportunities and challenges linguists are currently facing.After highlighting the significance of authentic data on linguistic research,we argue that language is a complex adaptive system driven by humans.Then,from the perspective of philosophy of science,we introduce the research paradigm of quantitative linguistics through several cases.Finally,we discuss how China’s linguistic research will benefit from the data-intensive approach in terms of scientification and internationalization.
基金supported by the Academic Divisions of the Chinese Academy of Sciences Forum on Frontiers of Science and Technology for Big Earth Data from Space
文摘Big data is a strategic highland in the era of knowledge-driven economies, and it is also a new type of strategic resource for all nations. Big data collected from space for Earth observation—so-called Big Earth Data—is creating new opportunities for the Earth sciences and revolutionizing the innovation of methodologies and thought patterns. It has potential to advance in-depth development of Earth sciences and bring more exciting scientific discoveries.The Academic Divisions of the Chinese Academy of Sciences Forum on Frontiers of Science and Technology for Big Earth Data from Space was held in Beijing in June of 2015.The forum analyzed the development of Earth observation technology and big data, explored the concepts and scientific connotations of Big Earth Data from space, discussed the correlation between Big Earth Data and Digital Earth, and dissected the potential of Big Earth Data from space to promote scientific discovery in the Earth sciences, especially concerning global changes.
文摘Digital Earth has seen great progress during the last 19 years.When it entered into the era of big data,Digital Earth developed into a new stage,namely one characterized by‘Big Earth Data’,confronting new challenges and opportunities.In this paper we give an overview of the development of Digital Earth by summarizing research achievements and marking the milestones of Digital Earth’s development.Then,the opportunities and challenges that Big Earth Data faces are discussed.As a data-intensive scientific research approach,Big Earth Data provides a new vision and methodology to Earth sciences,and the paper identifies the advantages of Big Earth Data to scientific research,especially in knowledge discovery and global change research.We believe that Big Earth Data will advance and promote the development of Digital Earth.
文摘Increasingly powerful computational technology has caused enormous data growth both in size and complexity. A key issue is how to organize the data to adapt the challenges of data analysis. This paper borrows ideas from the Internet of things (IOT) into the digital world and organize the data entities to form a network, the Internet of data (IOD), which has huge potential in data-intensive applications. In the IOD, data hiding technology is utilized to embed virtual tags, which record all the activities of the data entities since they are created, into every data entity in the system. The IOD aims to organize the data to be interconnected as a network and collect useful information for data identification, data tracing, data vitalization and further data analysis.
基金This work was supported by the regional innovation cooperation between Sichuan and Guangxi Provinces(Grant No.2020YFQ0019)the National Natural Science Foundation of China(Grant No.32070671).
文摘With the progression of modern information techniques,such as next generation sequencing(NGS),Internet of Everything(IoE)based smart sensors,and artificial intelligence algorithms,data-intensive research and applications are emerging as the fourth paradigm for scientific discovery.However,we facemany challenges to practical application of this paradigm.In this article,10 challenges to data-intensive discovery and applications in precision medicine and healthcare are summarized and the future perspectives on next generation medicine are discussed.
基金supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61872079,61572129,61602112,61502097,61702096,61320106007,61632008,and 61702097)+4 种基金the Natural Science Foundation of Jiangsu Province(Nos.BK20160695 and BK20170689)the Fundamental Research Funds for the Central Universities(No.2242018k1G019)the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9)partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.
文摘Big Earth Data analysis is a complex task requiring the integration of many skills and technologies.This paper provides a comprehensive review of the technology and terminology within the Big Earth Data problem space and presents examples of state-of-the-art projects in each major branch of Big Earth Data research.Current issues within Big Earth Data research are highlighted and potential future solutions identified.
基金This work is in part supported by the Director,Office of Advanced Scientific Computing Research,Office of Science,of the U.S.Department of Energy under Contract No.DE-AC02-06CH11357in part supported by the Exascale Computing Project under Grant No.17-SC-20-SC+1 种基金a joint project of the U.S.Department of Energy's Office of Science and National Nuclear Security Administration,responsible for delivering a capable exascale ecosystem,including software,applications,and hardware technology,to support the nation's exascale computing imperativein part supported by the U.S.Department of Energy,Office of Science,Office of Advanced Scientific Computing Research,Scientific Discovery through Advanced Computing(SciDAC)program.
文摘Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.