The emergence of "Big Data" has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being--"Smart Data." This paper shar...The emergence of "Big Data" has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being--"Smart Data." This paper shares the author's understanding of what, why, how, who, where, and which data in relation to Smart Data and digital humanities. It concludes that, challenges and opportunities co-exist, but it is certain that Smart Data, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, will continue to have extraordinary value in digital humanities.展开更多
Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction o...Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.展开更多
Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that...Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that the scale of China’s digital economy would reach 50 trillion yuan in 2022,accounting for more than 40%of GDP,presenting great market potential and room for the growth of the digital economy.With the rapid development of the digital economy,the state attaches great importance to the construction of digital infrastructure and has introduced a series of policies to promote the systematic development and large-scale deployment of digital infrastructure.In 2022 the Chinese government planned to build 8 arithmetic hubs and 10 national data center clusters nationwide.To proactively address the future demand for AI across various scenarios,there is a need for a well-structured computing power infrastructure.The data center,serving as the pivotal hub for computing power,has evolved from the conventional cloud center to a more intelligent computing center,allowing for a diversified convergence of computing power supply.Besides,the data center accommodates a diverse array of arithmetic business forms from customers,reflecting the multi-industry developmental trend.The arithmetic service platform is consistently broadening its scope,with ongoing optimization and innovation in the design scheme of machine room processes.The widespread application of submerged phase-change liquid cooling technology and cold plate cooling technology introduces a series of new challenges to the construction of digital infrastructure.This paper delves into the design objectives,industry considerations,layout,and other dimensions of a smart computing center and proposes a new-generation data center solution that is“flexible,resilient,green,and low-carbon.”展开更多
Metro system has experienced the global rapid rise over the past decades. However,few studies have paid attention to the evolution in system usage with the network expanding. The paper's main objectives are to ana...Metro system has experienced the global rapid rise over the past decades. However,few studies have paid attention to the evolution in system usage with the network expanding. The paper's main objectives are to analyze passenger flow characteristics and evaluate travel time reliability for the Nanjing Metro network by visualizing the smart card data of April 2014,April 2015 and April 2016. We performed visualization techniques and comparative analyses to examine the changes in system usage between before and after the system expansion. Specifically,workdays,holidays and weekends were specially segmented for analysis.Results showed that workdays had obvious morning and evening peak hours due to daily commuting,while no obvious peak hours existed in weekends and holidays and the daily traffic was evenly distributed. Besides,some metro stations had a serious directional imbalance,especially during the morning and evening peak hours of workdays. Serious unreliability occurred in morning peaks on workdays and the reliability of new lines was relatively low,meanwhile,new stations had negative effects on exiting stations in terms of reliability. Monitoring the evolution of system usage over years enables the identification of system performance and can serve as an input for improving the metro system quality.展开更多
As an essential component of bus dwelling time, passenger boarding time has a significant impact on bus running reliability and service quality. In order to understand the passengers’ boarding process and mitigate pa...As an essential component of bus dwelling time, passenger boarding time has a significant impact on bus running reliability and service quality. In order to understand the passengers’ boarding process and mitigate passenger boarding time, a regression analysis framework is proposed to capture the difference and influential factors of boarding time for adult and elderly passengers based on smart card data from Changzhou. Boarding gap, the time difference between two consecutive smart card tapping records, is calculated to approximate passenger boarding time. Analysis of variance is applied to identify whether the difference in boarding time between adults and seniors is statistically significant. The multivariate regression modeling approach is implemented to analyze the influences of passenger types, marginal effects of each additional boarding passenger and bus floor types on the total boarding time at each stop. Results show that a constant difference exists in boarding time between adults and seniors even without considering the specific bus characteristics. The average passenger boarding time decreases when the number of passenger increases. The existence of two entrance steps delays the boarding process, especially for elderly passengers.展开更多
Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advanc...Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.展开更多
Today,Internet of Things(IoT)is a technology paradigm which convinces many researchers for the purpose of achieving high performance of packets delivery in IoT applications such as smart cities.Interconnecting various...Today,Internet of Things(IoT)is a technology paradigm which convinces many researchers for the purpose of achieving high performance of packets delivery in IoT applications such as smart cities.Interconnecting various physical devices such as sensors or actuators with the Internet may causes different constraints on the network resources such as packets delivery ratio,energy efficiency,end-to-end delays etc.However,traditional scheduling methodologies in large-scale environments such as big data smart cities cannot meet the requirements for high performance network metrics.In big data smart cities applications which need fast packets transmission ratio such as sending priority packets to hospitals for an emergency case,an efficient schedulingmechanism ismandatory which is the main concern of this paper.In this paper,we overcome the shortcoming issues of the traditional scheduling algorithms that are utilized in big data smart cities emergency applications.Transmission information about the priority packets between the source nodes(i.e.,people with emergency cases)and the destination nodes(i.e.,hospitals)is performed before sending the packets in order to reserve transmission channels and prepare the sequence of transmission of theses priority packets between the two parties.In our proposed mechanism,Software Defined Networking(SDN)with centralized communication controller will be responsible for determining the scheduling and processing sequences for priority packets in big data smart cities environments.In this paper,we compare between our proposed Priority Packets Deadline First scheduling scheme(PPDF)with existing and traditional scheduling algorithms that can be used in urgent smart cities applications in order to illustrate the outstanding network performance parameters of our scheme such as the average waiting time,packets loss rates,priority packets end-to-end delay,and efficient energy consumption.展开更多
The development of the Global Navigation System and wireless networking technologies have changed the way we live, communicate, share information and even the collection of geospatial data in the field. Along with wir...The development of the Global Navigation System and wireless networking technologies have changed the way we live, communicate, share information and even the collection of geospatial data in the field. Along with wireless networking technologies, the improvement in computational power of handheld devices such as smartphones, tablet PCs, ultra-mobile personal computers (UMPCs) and netbook computers allow field users to connect, store and stream large amounts of geospatial data from the web-server. Nowadays, geospatial data collection is more flexible and timely manner. In this paper we discuss field data collection using a smartphone and web-based GIS system, which collects, integrates, visualizes and analyzes the collected data in real-time. We built a web-GIS system for creating a user account, acquiring coordinates from GPS embedded devices or wireless access points, and providing a user-friendly survey form. The collected data can be visualized and analyzed by performing thematic mapping, labeling, symbolizing, querying and generating a summary report. We tested this system on a university campus management system, in which we collected information on illegal disposal sites and parking events within the university campus.展开更多
At present, DL/T 645-2007 communication protocol is used to collect data for smart meters. However, in the beginning, this protocol is not designed to be a secure protocol and only the function and reliability were ta...At present, DL/T 645-2007 communication protocol is used to collect data for smart meters. However, in the beginning, this protocol is not designed to be a secure protocol and only the function and reliability were taken into account. Plaintext is used in the protocol for data transmission, as a result, attackers can easily sniff the information and cause information leakage. In this paper, man-in-the-middle attack was used to verify that the smart meter data acquisition process was vulnerable when facing third-party attacks, and this can result in data eavesdropping. In order to resist such risks and prevent information being eavesdropped, a real ammeter communication experimental environment was built, it realized two-way identity authentication between data acquisition center and ammeter data center. At the same time, RSA (Rivest-Shamir-Adleman) was used to encrypt the meter data, which encrypted the collection, storage process of meter data and ensured the confidentiality and integrity of the meter data transmission. Compared with other methods, this method had obvious advantages. The analysis showed that this method can effectively prevent the data of smart meters from being eavesdropped.展开更多
The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibi...The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.展开更多
The storage space and cost for Smart Grid datasets has been growing exponentially due to its high data-rate of various sensor readings from Automated Metering Infrastructure (AMI), and Phasor Measurement Units (PMUs)....The storage space and cost for Smart Grid datasets has been growing exponentially due to its high data-rate of various sensor readings from Automated Metering Infrastructure (AMI), and Phasor Measurement Units (PMUs). The paper focuses on Phasor Data Concentrators (PDCs) that aggregate data from PMUs. PMUs measure real-time voltage, current and frequency parameters across the electrical grid. A typical PDC can process data from anywhere ten to forty PMUs. The paper exploits the need for appropriate security and data compression challenges simultaneously. As a result, an optimal compression method ER1c is investigated for efficient storage of IREG and C37.118 timestamped PDC data sets. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs (SEL 3373, GE Multilin P30) by 80%. For example, 2 years of PDC data storage space can be easily replaced with only 10 days of storage space. In addition, our approach in combination with AES 256 encryption can protect PDC data to larger degree as per National Institute of Standards and Technology (NIST) standards.展开更多
To reduce the stress of data transmission and storage for power quality (PQ) in smart distribution systems and help PQ analysis, a multichannel data compression based on iterative PCA (principal component analysis) al...To reduce the stress of data transmission and storage for power quality (PQ) in smart distribution systems and help PQ analysis, a multichannel data compression based on iterative PCA (principal component analysis) algorithm is introduced. The proposed method uses PCA to reduce the redundancy of data to achieve the purpose of compressing data. In order to improve the calculating speed, an iterative method is proposed to compute the principal components of the covariance matrix. The correctness and feasibility of the proposed method are verified by field PQ data tests. Compared with discrete wavelet transform (DWT) method, the proposed method has good performance on compression ratio and reconstruction accuracy.展开更多
文摘The emergence of "Big Data" has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being--"Smart Data." This paper shares the author's understanding of what, why, how, who, where, and which data in relation to Smart Data and digital humanities. It concludes that, challenges and opportunities co-exist, but it is certain that Smart Data, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, will continue to have extraordinary value in digital humanities.
文摘Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.
文摘Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that the scale of China’s digital economy would reach 50 trillion yuan in 2022,accounting for more than 40%of GDP,presenting great market potential and room for the growth of the digital economy.With the rapid development of the digital economy,the state attaches great importance to the construction of digital infrastructure and has introduced a series of policies to promote the systematic development and large-scale deployment of digital infrastructure.In 2022 the Chinese government planned to build 8 arithmetic hubs and 10 national data center clusters nationwide.To proactively address the future demand for AI across various scenarios,there is a need for a well-structured computing power infrastructure.The data center,serving as the pivotal hub for computing power,has evolved from the conventional cloud center to a more intelligent computing center,allowing for a diversified convergence of computing power supply.Besides,the data center accommodates a diverse array of arithmetic business forms from customers,reflecting the multi-industry developmental trend.The arithmetic service platform is consistently broadening its scope,with ongoing optimization and innovation in the design scheme of machine room processes.The widespread application of submerged phase-change liquid cooling technology and cold plate cooling technology introduces a series of new challenges to the construction of digital infrastructure.This paper delves into the design objectives,industry considerations,layout,and other dimensions of a smart computing center and proposes a new-generation data center solution that is“flexible,resilient,green,and low-carbon.”
基金Sponsored by Projects of International Cooperation and Exchange of the National Natural Science Foundation of China(Grant No.51561135003)Key Project of National Natural Science Foundation of China(Grant No.51338003)
文摘Metro system has experienced the global rapid rise over the past decades. However,few studies have paid attention to the evolution in system usage with the network expanding. The paper's main objectives are to analyze passenger flow characteristics and evaluate travel time reliability for the Nanjing Metro network by visualizing the smart card data of April 2014,April 2015 and April 2016. We performed visualization techniques and comparative analyses to examine the changes in system usage between before and after the system expansion. Specifically,workdays,holidays and weekends were specially segmented for analysis.Results showed that workdays had obvious morning and evening peak hours due to daily commuting,while no obvious peak hours existed in weekends and holidays and the daily traffic was evenly distributed. Besides,some metro stations had a serious directional imbalance,especially during the morning and evening peak hours of workdays. Serious unreliability occurred in morning peaks on workdays and the reliability of new lines was relatively low,meanwhile,new stations had negative effects on exiting stations in terms of reliability. Monitoring the evolution of system usage over years enables the identification of system performance and can serve as an input for improving the metro system quality.
基金The National Natural Science Foundation of China(No.51338003,71801041)
文摘As an essential component of bus dwelling time, passenger boarding time has a significant impact on bus running reliability and service quality. In order to understand the passengers’ boarding process and mitigate passenger boarding time, a regression analysis framework is proposed to capture the difference and influential factors of boarding time for adult and elderly passengers based on smart card data from Changzhou. Boarding gap, the time difference between two consecutive smart card tapping records, is calculated to approximate passenger boarding time. Analysis of variance is applied to identify whether the difference in boarding time between adults and seniors is statistically significant. The multivariate regression modeling approach is implemented to analyze the influences of passenger types, marginal effects of each additional boarding passenger and bus floor types on the total boarding time at each stop. Results show that a constant difference exists in boarding time between adults and seniors even without considering the specific bus characteristics. The average passenger boarding time decreases when the number of passenger increases. The existence of two entrance steps delays the boarding process, especially for elderly passengers.
文摘Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.
基金This study is supported through Taif University Researchers Supporting Project Number(TURSP-2020/150),Taif University,Taif,Saudi Arabia.
文摘Today,Internet of Things(IoT)is a technology paradigm which convinces many researchers for the purpose of achieving high performance of packets delivery in IoT applications such as smart cities.Interconnecting various physical devices such as sensors or actuators with the Internet may causes different constraints on the network resources such as packets delivery ratio,energy efficiency,end-to-end delays etc.However,traditional scheduling methodologies in large-scale environments such as big data smart cities cannot meet the requirements for high performance network metrics.In big data smart cities applications which need fast packets transmission ratio such as sending priority packets to hospitals for an emergency case,an efficient schedulingmechanism ismandatory which is the main concern of this paper.In this paper,we overcome the shortcoming issues of the traditional scheduling algorithms that are utilized in big data smart cities emergency applications.Transmission information about the priority packets between the source nodes(i.e.,people with emergency cases)and the destination nodes(i.e.,hospitals)is performed before sending the packets in order to reserve transmission channels and prepare the sequence of transmission of theses priority packets between the two parties.In our proposed mechanism,Software Defined Networking(SDN)with centralized communication controller will be responsible for determining the scheduling and processing sequences for priority packets in big data smart cities environments.In this paper,we compare between our proposed Priority Packets Deadline First scheduling scheme(PPDF)with existing and traditional scheduling algorithms that can be used in urgent smart cities applications in order to illustrate the outstanding network performance parameters of our scheme such as the average waiting time,packets loss rates,priority packets end-to-end delay,and efficient energy consumption.
文摘The development of the Global Navigation System and wireless networking technologies have changed the way we live, communicate, share information and even the collection of geospatial data in the field. Along with wireless networking technologies, the improvement in computational power of handheld devices such as smartphones, tablet PCs, ultra-mobile personal computers (UMPCs) and netbook computers allow field users to connect, store and stream large amounts of geospatial data from the web-server. Nowadays, geospatial data collection is more flexible and timely manner. In this paper we discuss field data collection using a smartphone and web-based GIS system, which collects, integrates, visualizes and analyzes the collected data in real-time. We built a web-GIS system for creating a user account, acquiring coordinates from GPS embedded devices or wireless access points, and providing a user-friendly survey form. The collected data can be visualized and analyzed by performing thematic mapping, labeling, symbolizing, querying and generating a summary report. We tested this system on a university campus management system, in which we collected information on illegal disposal sites and parking events within the university campus.
文摘At present, DL/T 645-2007 communication protocol is used to collect data for smart meters. However, in the beginning, this protocol is not designed to be a secure protocol and only the function and reliability were taken into account. Plaintext is used in the protocol for data transmission, as a result, attackers can easily sniff the information and cause information leakage. In this paper, man-in-the-middle attack was used to verify that the smart meter data acquisition process was vulnerable when facing third-party attacks, and this can result in data eavesdropping. In order to resist such risks and prevent information being eavesdropped, a real ammeter communication experimental environment was built, it realized two-way identity authentication between data acquisition center and ammeter data center. At the same time, RSA (Rivest-Shamir-Adleman) was used to encrypt the meter data, which encrypted the collection, storage process of meter data and ensured the confidentiality and integrity of the meter data transmission. Compared with other methods, this method had obvious advantages. The analysis showed that this method can effectively prevent the data of smart meters from being eavesdropped.
文摘The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.
文摘The storage space and cost for Smart Grid datasets has been growing exponentially due to its high data-rate of various sensor readings from Automated Metering Infrastructure (AMI), and Phasor Measurement Units (PMUs). The paper focuses on Phasor Data Concentrators (PDCs) that aggregate data from PMUs. PMUs measure real-time voltage, current and frequency parameters across the electrical grid. A typical PDC can process data from anywhere ten to forty PMUs. The paper exploits the need for appropriate security and data compression challenges simultaneously. As a result, an optimal compression method ER1c is investigated for efficient storage of IREG and C37.118 timestamped PDC data sets. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs (SEL 3373, GE Multilin P30) by 80%. For example, 2 years of PDC data storage space can be easily replaced with only 10 days of storage space. In addition, our approach in combination with AES 256 encryption can protect PDC data to larger degree as per National Institute of Standards and Technology (NIST) standards.
文摘To reduce the stress of data transmission and storage for power quality (PQ) in smart distribution systems and help PQ analysis, a multichannel data compression based on iterative PCA (principal component analysis) algorithm is introduced. The proposed method uses PCA to reduce the redundancy of data to achieve the purpose of compressing data. In order to improve the calculating speed, an iterative method is proposed to compute the principal components of the covariance matrix. The correctness and feasibility of the proposed method are verified by field PQ data tests. Compared with discrete wavelet transform (DWT) method, the proposed method has good performance on compression ratio and reconstruction accuracy.