Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction o...Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.展开更多
ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it ...ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical.展开更多
Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In dat...Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.展开更多
From the viewpoint of systems science, this article takes Xiaosha River artificial wetland under planning and construction as object of study based on the systems theory and takes the accomplished and running project ...From the viewpoint of systems science, this article takes Xiaosha River artificial wetland under planning and construction as object of study based on the systems theory and takes the accomplished and running project of Xinxuehe artificial wetland as reference. The virtual data of quantity and quality of inflow and the quality of outflow of Xiaosha River artificial wetland are built up according to the running experience, forecasting model and theoretical method of the reference project as well as the comparison analysis of the similarity and difference of the two example projects. The virtual data are used to study the building of forecasting model of BP neural network of Xiaosha River artificial wetland.展开更多
Natural mortality coefficient (M) was estimated from fish abundance (N) and catch (C) data using a Virtual Population Analysis (VPA) model. Monte Carlo simulations were used to evaluate the impact of different error d...Natural mortality coefficient (M) was estimated from fish abundance (N) and catch (C) data using a Virtual Population Analysis (VPA) model. Monte Carlo simulations were used to evaluate the impact of different error distributions for the simulated data on the estimates of M. Among the four error structures (normal, lognormal, Poisson and gamma), simulations of normally dis-tributed errors produced the most viable estimates for M, with the lowest relative estimation errors (REEs) and median mean absolute deviations (MADs) for the ratio of the true to the estimated Ms. In contrast, the lognormal distribution had the largest REE value. Errors with different coefficients of variation (CV) were added to N and C. In general, when CVs in the data were less than 10%, reliable estimates of M were obtained. For normal and lognormal distributions, the estimates of M were more sensitive to the CVs in N than in C; when only C had error the estimates were close to the true. For Poisson and gamma distributions, opposite results were obtained. For instance, the estimates were more sensitive to the CVs in C than in N, with the largest REE from the scenario of error only in C. Two scenarios of high and low fishing mortality coefficient (F) were generated, and the simulation results showed that the method performed better for the scenario with low F. This method was also applied to the published data for the anchovy (Engraulis japonicus) of the Yellow Sea. Viable estimates of M were obtained for young groups, which may be explained by the fact that the great uncertainties in N and C observed for older Yellow Sea anchovy introduced large variation in the corresponding estimates of M.展开更多
This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously ...This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously modeled as a network of M/M/1 queues. Agenetic algorithm to solve this problem is presented. Extensive computational results across a variety of networks are reported. These results indicate that the presented solution procedure outperforms the other methods in the literature and is effective for a wide range of traffic loads.展开更多
This paper proposes a virtual router cluster system based on the separation of the control plane and the data plane from multiple perspectives,such as architecture,key technologies,scenarios and standardization.To som...This paper proposes a virtual router cluster system based on the separation of the control plane and the data plane from multiple perspectives,such as architecture,key technologies,scenarios and standardization.To some extent,the virtual cluster simplifies network topology and management,achieves automatic conFig.uration and saves the IP address.It is a kind of low-cost expansion method of aggregation equipment port density.展开更多
The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Onto...The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Ontology-Based Data Access(ODBA)query data on-the-fly against the original data sources without any prior data materialization.Existing approaches by design use a fixed model e.g.,TABULAR as the only Virtual Data Model-a uniform schema built on-the-fly to load,transform,and join relevant data.While other data models,such as GRAPH or DOCUMENT,are more flexible and,thus,can be more suitable for some common types of queries,such as join or nested queries.Those queries are hard to predict because they depend on many criteria,such as query plan,data model,data size,and operations.To address the problem of selecting the optimal virtual data model for queries on large datasets,we present a new approach that(1)builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and(2)calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries.OPTIMA-implementation of our approach currently leverages state-of-the-art Big Data technologies,Apache-Spark and Graphx,and implements two virtual data models,GRAPH and TABULAR,and supports out-of-the-box five data sources models:property graph,document-based,e.g.,wide-columnar,relational,and tabular,stored in Neo4j,MongoDB,Cassandra,MySQL,and CSV respectively.Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831,thus,a reduction in query execution time of over 40%for the tabular model selection and over 30%for the graph model selection.展开更多
Cloud computing has gained significant recognition due to its ability to provide a broad range of online services and applications.Nevertheless,existing commercial cloud computing models demonstrate an appropriate des...Cloud computing has gained significant recognition due to its ability to provide a broad range of online services and applications.Nevertheless,existing commercial cloud computing models demonstrate an appropriate design by concentrating computational assets,such as preservation and server infrastructure,in a limited number of large-scale worldwide data facilities.Optimizing the deployment of virtual machines(VMs)is crucial in this scenario to ensure system dependability,performance,and minimal latency.A significant barrier in the present scenario is the load distribution,particularly when striving for improved energy consumption in a hypothetical grid computing framework.This design employs load-balancing techniques to allocate different user workloads across several virtual machines.To address this challenge,we propose using the twin-fold moth flame technique,which serves as a very effective optimization technique.Developers intentionally designed the twin-fold moth flame method to consider various restrictions,including energy efficiency,lifespan analysis,and resource expenditures.It provides a thorough approach to evaluating total costs in the cloud computing environment.When assessing the efficacy of our suggested strategy,the study will analyze significant metrics such as energy efficiency,lifespan analysis,and resource expenditures.This investigation aims to enhance cloud computing techniques by developing a new optimization algorithm that considers multiple factors for effective virtual machine placement and load balancing.The proposed work demonstrates notable improvements of 12.15%,10.68%,8.70%,13.29%,18.46%,and 33.39%for 40 count data of nodes using the artificial bee colony-bat algorithm,ant colony optimization,crow search algorithm,krill herd,whale optimization genetic algorithm,and improved Lévy-based whale optimization algorithm,respectively.展开更多
文章提出了一种基于多特征要素的网络安全审计中的特征数据关联方法。该方法以国际移动设备识别码(International Mobile Equipment Identity,IMEI)、国际移动用户识别码(International Mobile Subscriber Identification,IMSI)、移动终...文章提出了一种基于多特征要素的网络安全审计中的特征数据关联方法。该方法以国际移动设备识别码(International Mobile Equipment Identity,IMEI)、国际移动用户识别码(International Mobile Subscriber Identification,IMSI)、移动终端MAC(TERMINAL_MAC)地址三个特征要素为关联因子,通过持续更新完善特征信息串的方式,有效解决了在接入网络的移动终端MAC地址可能发生周期变化的情况下,构建移动终端用户唯一虚拟画像的问题。展开更多
文摘Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.
文摘ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical.
基金supported in part by the following funding agencies of China:National Natural Science Foundation under Grant 61602050 and U1534201National Key Research and Development Program of China under Grant 2016QY01W0200
文摘Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.
文摘From the viewpoint of systems science, this article takes Xiaosha River artificial wetland under planning and construction as object of study based on the systems theory and takes the accomplished and running project of Xinxuehe artificial wetland as reference. The virtual data of quantity and quality of inflow and the quality of outflow of Xiaosha River artificial wetland are built up according to the running experience, forecasting model and theoretical method of the reference project as well as the comparison analysis of the similarity and difference of the two example projects. The virtual data are used to study the building of forecasting model of BP neural network of Xiaosha River artificial wetland.
文摘Natural mortality coefficient (M) was estimated from fish abundance (N) and catch (C) data using a Virtual Population Analysis (VPA) model. Monte Carlo simulations were used to evaluate the impact of different error distributions for the simulated data on the estimates of M. Among the four error structures (normal, lognormal, Poisson and gamma), simulations of normally dis-tributed errors produced the most viable estimates for M, with the lowest relative estimation errors (REEs) and median mean absolute deviations (MADs) for the ratio of the true to the estimated Ms. In contrast, the lognormal distribution had the largest REE value. Errors with different coefficients of variation (CV) were added to N and C. In general, when CVs in the data were less than 10%, reliable estimates of M were obtained. For normal and lognormal distributions, the estimates of M were more sensitive to the CVs in N than in C; when only C had error the estimates were close to the true. For Poisson and gamma distributions, opposite results were obtained. For instance, the estimates were more sensitive to the CVs in C than in N, with the largest REE from the scenario of error only in C. Two scenarios of high and low fishing mortality coefficient (F) were generated, and the simulation results showed that the method performed better for the scenario with low F. This method was also applied to the published data for the anchovy (Engraulis japonicus) of the Yellow Sea. Viable estimates of M were obtained for young groups, which may be explained by the fact that the great uncertainties in N and C observed for older Yellow Sea anchovy introduced large variation in the corresponding estimates of M.
文摘This paper addresses the problem of selecting a route for every pair of communicating nodes in a virtual circuit data network in order to minimize the average delay encountered by messages. The problem was previously modeled as a network of M/M/1 queues. Agenetic algorithm to solve this problem is presented. Extensive computational results across a variety of networks are reported. These results indicate that the presented solution procedure outperforms the other methods in the literature and is effective for a wide range of traffic loads.
基金supported by the Collaboration Research on Key Techniques of Future Network between China,Japan and Korea(2010DFB13470)~~
文摘This paper proposes a virtual router cluster system based on the separation of the control plane and the data plane from multiple perspectives,such as architecture,key technologies,scenarios and standardization.To some extent,the virtual cluster simplifies network topology and management,achieves automatic conFig.uration and saves the IP address.It is a kind of low-cost expansion method of aggregation equipment port density.
基金the financial support of Fraunhofer Cluster of Excellence (CCIT)
文摘The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Ontology-Based Data Access(ODBA)query data on-the-fly against the original data sources without any prior data materialization.Existing approaches by design use a fixed model e.g.,TABULAR as the only Virtual Data Model-a uniform schema built on-the-fly to load,transform,and join relevant data.While other data models,such as GRAPH or DOCUMENT,are more flexible and,thus,can be more suitable for some common types of queries,such as join or nested queries.Those queries are hard to predict because they depend on many criteria,such as query plan,data model,data size,and operations.To address the problem of selecting the optimal virtual data model for queries on large datasets,we present a new approach that(1)builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and(2)calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries.OPTIMA-implementation of our approach currently leverages state-of-the-art Big Data technologies,Apache-Spark and Graphx,and implements two virtual data models,GRAPH and TABULAR,and supports out-of-the-box five data sources models:property graph,document-based,e.g.,wide-columnar,relational,and tabular,stored in Neo4j,MongoDB,Cassandra,MySQL,and CSV respectively.Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831,thus,a reduction in query execution time of over 40%for the tabular model selection and over 30%for the graph model selection.
基金This work was supported in part by the Natural Science Foundation of the Education Department of Henan Province(Grant 22A520025)the National Natural Science Foundation of China(Grant 61975053)the National Key Research and Development of Quality Information Control Technology for Multi-Modal Grain Transportation Efficient Connection(2022YFD2100202).
文摘Cloud computing has gained significant recognition due to its ability to provide a broad range of online services and applications.Nevertheless,existing commercial cloud computing models demonstrate an appropriate design by concentrating computational assets,such as preservation and server infrastructure,in a limited number of large-scale worldwide data facilities.Optimizing the deployment of virtual machines(VMs)is crucial in this scenario to ensure system dependability,performance,and minimal latency.A significant barrier in the present scenario is the load distribution,particularly when striving for improved energy consumption in a hypothetical grid computing framework.This design employs load-balancing techniques to allocate different user workloads across several virtual machines.To address this challenge,we propose using the twin-fold moth flame technique,which serves as a very effective optimization technique.Developers intentionally designed the twin-fold moth flame method to consider various restrictions,including energy efficiency,lifespan analysis,and resource expenditures.It provides a thorough approach to evaluating total costs in the cloud computing environment.When assessing the efficacy of our suggested strategy,the study will analyze significant metrics such as energy efficiency,lifespan analysis,and resource expenditures.This investigation aims to enhance cloud computing techniques by developing a new optimization algorithm that considers multiple factors for effective virtual machine placement and load balancing.The proposed work demonstrates notable improvements of 12.15%,10.68%,8.70%,13.29%,18.46%,and 33.39%for 40 count data of nodes using the artificial bee colony-bat algorithm,ant colony optimization,crow search algorithm,krill herd,whale optimization genetic algorithm,and improved Lévy-based whale optimization algorithm,respectively.
文摘文章提出了一种基于多特征要素的网络安全审计中的特征数据关联方法。该方法以国际移动设备识别码(International Mobile Equipment Identity,IMEI)、国际移动用户识别码(International Mobile Subscriber Identification,IMSI)、移动终端MAC(TERMINAL_MAC)地址三个特征要素为关联因子,通过持续更新完善特征信息串的方式,有效解决了在接入网络的移动终端MAC地址可能发生周期变化的情况下,构建移动终端用户唯一虚拟画像的问题。