Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly im proved c...Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly im proved computing and storage. Management has become easier, andOAM costs have been significantly reduced. Cloud desktop technology is develop ing rapidly. With this technology, users can flexibly and dynamically use virtual ma chine resources, companies' efficiency of using and allocating resources is greatly improved, and information security is ensured. In most existing virtual cloud desk top solutions, computing and storage are bound together, and data is stored as im age files. This limits the flexibility and expandability of systems and is insufficient for meetinz customers' requirements in different scenarios.展开更多
The fast technology development of 5G mobile broadband (5G), Internet of Things (IoT), Big Data Analytics (Big Data), Cloud Computing (Cloud) and Software Defined Networks (SDN) has made those technologies one after a...The fast technology development of 5G mobile broadband (5G), Internet of Things (IoT), Big Data Analytics (Big Data), Cloud Computing (Cloud) and Software Defined Networks (SDN) has made those technologies one after another and created strong interdependence among one another. For example, IoT applications that generate small data with large volume and fast velocity will need 5G with characteristics of high data rate and low latency to transmit such data faster and cheaper. On the other hand, those data also need Cloud to process and to store and furthermore, SDN to provide scalable network infrastructure to transport this large volume of data in an optimal way. This article explores the technical relationships among the development of IoT, Big Data, Cloud, and SDN in the coming 5G era and illustrates several ongoing programs and applications at National Chiao Tung University that are based on the converging of those technologies.展开更多
Big Data applications are pervading more and more aspects of our life, encompassing commercial and scientific uses at increasing rates as we move towards exascale analytics. Examples of Big Data applications include s...Big Data applications are pervading more and more aspects of our life, encompassing commercial and scientific uses at increasing rates as we move towards exascale analytics. Examples of Big Data applications include storing and accessing user data in commercial clouds, mining of social data, and analysis of large-scale simulations and experiments such as the Large Hadron Collider. An increasing number of such data—intensive applications and services are relying on clouds in order to process and manage the enormous amounts of data required for continuous operation. It can be difficult to decide which of the many options for cloud processing is suitable for a given application;the aim of this paper is therefore to provide an interested user with an overview of the most important concepts of cloud computing as it relates to processing of Big Data.展开更多
Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open sour...Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.展开更多
With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big D...With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.展开更多
This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-ba...This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to the processing technique. We highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select the appropri- ate processing technique. We also provide big data research challenges and future directions in aspect to transportation management systems.展开更多
Intellectualization has become a new trend for telecom industry, driven by intelligent technology including cloud computing, big data, and Internet of things. In order to satisfy the service demand of intelligent logi...Intellectualization has become a new trend for telecom industry, driven by intelligent technology including cloud computing, big data, and Internet of things. In order to satisfy the service demand of intelligent logistics, this paper designed an intelligent logistics platform containing the main applications such as e-commerce, self-service transceiver, big data analysis, path location and distribution optimization. The intelligent logistics service platform has been built based on cloud computing to collect, store and handling multi-source heterogeneous mass data from sensors, RFID electronic tag, vehicle terminals and APP, so that the open-access cloud services including distribution, positioning, navigation, scheduling and other data services can be provided for the logistics distribution applications. And then the architecture of intelligent logistics cloud platform containing software layer(SaaS), platform layer(PaaS) and infrastructure(IaaS) has been constructed accordance with the core technology relative high concurrent processing technique, heterogeneous terminal data access, encapsulation and data mining. Therefore, intelligent logistics cloud platform can be carried out by the service mode for implementation to accelerate the construction of the symbiotic win-winlogistics ecological system and the benign development of the ICT industry in the trend of intellectualization in China.展开更多
Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, de...Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, deriving values and insights from big data using rich analytics becomes important for achieving competitiveness, success and leadership in every field. The Internet of Things (IoT) is causing the number and types of products to emit data at an unprecedented rate. Heterogeneity, scale, timeliness, complexity, and privacy problems with large data impede progress at all phases of the pipeline that can create value from data issues. With the push of such massive data, we are entering a new era of computing driven by novel and ground breaking research innovation on elastic parallelism, partitioning and scalability. Designing a scalable system for analysing, processing and mining huge real world datasets has become one of the challenging problems facing both systems researchers and data management researchers. In this paper, we will give an overview of computing infrastructure for IoT data processing, focusing on architectural and major challenges of massive data. We will briefly discuss about emerging computing infrastructure and technologies that are promising for improving massive data management.展开更多
Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer...Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.展开更多
Task duplication has been widely adopted to mitigate the impact of stragglers that run much longer than normal tasks. However,task duplication on data pipelining case would generate excessive traffic over the datacent...Task duplication has been widely adopted to mitigate the impact of stragglers that run much longer than normal tasks. However,task duplication on data pipelining case would generate excessive traffic over the datacenter networks. In this paper, we study minimizing the traffic cost for data pipelining task replications and design a controller that chooses the data generated by the first finished task and discards data generated later by other replications belonging to the same task. Each task replication communicates with the controller when it finishes a data processing, which causes additional network overhead. Hence, we try to reduce the network overhead and make a trade-off between the delay of data block and the network overhead. Finally, extensive simulation results demonstrate that our proposal can minimize network traffic cost under data pipelining case.展开更多
文摘Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly im proved computing and storage. Management has become easier, andOAM costs have been significantly reduced. Cloud desktop technology is develop ing rapidly. With this technology, users can flexibly and dynamically use virtual ma chine resources, companies' efficiency of using and allocating resources is greatly improved, and information security is ensured. In most existing virtual cloud desk top solutions, computing and storage are bound together, and data is stored as im age files. This limits the flexibility and expandability of systems and is insufficient for meetinz customers' requirements in different scenarios.
文摘The fast technology development of 5G mobile broadband (5G), Internet of Things (IoT), Big Data Analytics (Big Data), Cloud Computing (Cloud) and Software Defined Networks (SDN) has made those technologies one after another and created strong interdependence among one another. For example, IoT applications that generate small data with large volume and fast velocity will need 5G with characteristics of high data rate and low latency to transmit such data faster and cheaper. On the other hand, those data also need Cloud to process and to store and furthermore, SDN to provide scalable network infrastructure to transport this large volume of data in an optimal way. This article explores the technical relationships among the development of IoT, Big Data, Cloud, and SDN in the coming 5G era and illustrates several ongoing programs and applications at National Chiao Tung University that are based on the converging of those technologies.
文摘Big Data applications are pervading more and more aspects of our life, encompassing commercial and scientific uses at increasing rates as we move towards exascale analytics. Examples of Big Data applications include storing and accessing user data in commercial clouds, mining of social data, and analysis of large-scale simulations and experiments such as the Large Hadron Collider. An increasing number of such data—intensive applications and services are relying on clouds in order to process and manage the enormous amounts of data required for continuous operation. It can be difficult to decide which of the many options for cloud processing is suitable for a given application;the aim of this paper is therefore to provide an interested user with an overview of the most important concepts of cloud computing as it relates to processing of Big Data.
文摘Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.
基金supported by the Research Fund of Tencent Computer System Co.Ltd.under Grant No.170125
文摘With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.
基金supported in part by the National Basic Research Program(973 Program,No.2015CB352400)NSFC under grant U1401258U.S NSF under grant CCF-1016966
文摘This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to the processing technique. We highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select the appropri- ate processing technique. We also provide big data research challenges and future directions in aspect to transportation management systems.
基金supported in part by National Key Research and Development Program under Grant No. 2016YFC0803206China Postdoctoral Science Foundation under Grant No.2016M600972
文摘Intellectualization has become a new trend for telecom industry, driven by intelligent technology including cloud computing, big data, and Internet of things. In order to satisfy the service demand of intelligent logistics, this paper designed an intelligent logistics platform containing the main applications such as e-commerce, self-service transceiver, big data analysis, path location and distribution optimization. The intelligent logistics service platform has been built based on cloud computing to collect, store and handling multi-source heterogeneous mass data from sensors, RFID electronic tag, vehicle terminals and APP, so that the open-access cloud services including distribution, positioning, navigation, scheduling and other data services can be provided for the logistics distribution applications. And then the architecture of intelligent logistics cloud platform containing software layer(SaaS), platform layer(PaaS) and infrastructure(IaaS) has been constructed accordance with the core technology relative high concurrent processing technique, heterogeneous terminal data access, encapsulation and data mining. Therefore, intelligent logistics cloud platform can be carried out by the service mode for implementation to accelerate the construction of the symbiotic win-winlogistics ecological system and the benign development of the ICT industry in the trend of intellectualization in China.
文摘Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, deriving values and insights from big data using rich analytics becomes important for achieving competitiveness, success and leadership in every field. The Internet of Things (IoT) is causing the number and types of products to emit data at an unprecedented rate. Heterogeneity, scale, timeliness, complexity, and privacy problems with large data impede progress at all phases of the pipeline that can create value from data issues. With the push of such massive data, we are entering a new era of computing driven by novel and ground breaking research innovation on elastic parallelism, partitioning and scalability. Designing a scalable system for analysing, processing and mining huge real world datasets has become one of the challenging problems facing both systems researchers and data management researchers. In this paper, we will give an overview of computing infrastructure for IoT data processing, focusing on architectural and major challenges of massive data. We will briefly discuss about emerging computing infrastructure and technologies that are promising for improving massive data management.
基金supported by National Basic Research Program of China(973Program)(2012CB720000)National Natural Science Foundation of China(61225015,61273128)+2 种基金Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61321002)the Ph.D.Programs Foundation of Ministry of Education of China(20111101110012)CAST Foundation(CAST201210)
文摘Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.
文摘Task duplication has been widely adopted to mitigate the impact of stragglers that run much longer than normal tasks. However,task duplication on data pipelining case would generate excessive traffic over the datacenter networks. In this paper, we study minimizing the traffic cost for data pipelining task replications and design a controller that chooses the data generated by the first finished task and discards data generated later by other replications belonging to the same task. Each task replication communicates with the controller when it finishes a data processing, which causes additional network overhead. Hence, we try to reduce the network overhead and make a trade-off between the delay of data block and the network overhead. Finally, extensive simulation results demonstrate that our proposal can minimize network traffic cost under data pipelining case.