Embedded memory,which heavily relies on the manufacturing process,has been widely adopted in various industrial applications.As the field of embedded memory continues to evolve,innovative strategies are emerging to en...Embedded memory,which heavily relies on the manufacturing process,has been widely adopted in various industrial applications.As the field of embedded memory continues to evolve,innovative strategies are emerging to enhance performance.Among them,resistive random access memory(RRAM)has gained significant attention due to its numerousadvantages over traditional memory devices,including high speed(<1 ns),high density(4 F^(2)·n^(-1)),high scalability(~nm),and low power consumption(~pJ).This review focuses on the recent progress of embedded RRAM in industrial manufacturing and its potentialapplications.It provides a brief introduction to the concepts and advantages of RRAM,discusses the key factors that impact its industrial manufacturing,and presents the commercial progress driven by cutting-edge nanotechnology,which has been pursued by manysemiconductor giants.Additionally,it highlights the adoption of embedded RRAM in emerging applications within the realm of the Internet of Things and future intelligent computing,with a particular emphasis on its role in neuromorphic computing.Finally,the review discusses thecurrent challenges and provides insights into the prospects of embedded RRAM in the era of big data and artificial intelligence.展开更多
The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its p...The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.展开更多
With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-per...With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-performance computing,require enhanced computing power.To meet this requirement,a non-uniform memory access(NUMA)configuration method is proposed for the cloud computing system according to the affinity,adaptability,and availability of the NUMA architecture processor platform.The proposed method is verified based on the test environment of a domestic central processing unit(CPU).展开更多
Cu/HfOx/n^+Si devices are fabricated to investigate the influence of technological parameters including film thickness and Ar/02 ratio on the resistive switching (RS) characteristics of HfOx films, in terms of swit...Cu/HfOx/n^+Si devices are fabricated to investigate the influence of technological parameters including film thickness and Ar/02 ratio on the resistive switching (RS) characteristics of HfOx films, in terms of switch ratio, endurance properties, retention time and multilevel storage. It is revealed that the RS characteristics show strong dependence on technological parameters mainly by altering the defects (oxygen vacancies) in the film. The sample with thickness of 2Onto and Ar/O2 ratio of 12:3 exhibits the best RS behavior with the potential of multilevel storage. The conduction mechanism of all the films is interpreted based on the filamentary model.展开更多
In this paper,we study the system performance of mobile edge computing(MEC)wireless sensor networks(WSNs)using a multiantenna access point(AP)and two sensor clusters based on uplink nonorthogonal multiple access(NOMA)...In this paper,we study the system performance of mobile edge computing(MEC)wireless sensor networks(WSNs)using a multiantenna access point(AP)and two sensor clusters based on uplink nonorthogonal multiple access(NOMA).Due to limited computation and energy resources,the cluster heads(CHs)offload their tasks to a multiantenna AP over Nakagami-m fading.We proposed a combination protocol for NOMA-MEC-WSNs in which the AP selects either selection combining(SC)or maximal ratio combining(MRC)and each cluster selects a CH to participate in the communication process by employing the sensor node(SN)selection.We derive the closed-form exact expressions of the successful computation probability(SCP)to evaluate the system performance with the latency and energy consumption constraints of the considered WSN.Numerical results are provided to gain insight into the system performance in terms of the SCP based on system parameters such as the number of AP antennas,number of SNs in each cluster,task length,working frequency,offloading ratio,and transmit power allocation.Furthermore,to determine the optimal resource parameters,i.e.,the offloading ratio,power allocation of the two CHs,and MEC AP resources,we proposed two algorithms to achieve the best system performance.Our approach reveals that the optimal parameters with different schemes significantly improve SCP compared to other similar studies.We use Monte Carlo simulations to confirm the validity of our analysis.展开更多
Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction o...Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.展开更多
基金supported by the Key-Area Research and Development Program of Guangdong Province(Grant No.2021B0909060002)National Natural Science Foundation of China(Grant Nos.62204219,62204140)+1 种基金Major Program of Natural Science Foundation of Zhejiang Province(Grant No.LDT23F0401)Thanks to Professor Zhang Yishu from Zhejiang University,Professor Gao Xu from Soochow University,and Professor Zhong Shuai from Guangdong Institute of Intelligence Science and Technology for their support。
文摘Embedded memory,which heavily relies on the manufacturing process,has been widely adopted in various industrial applications.As the field of embedded memory continues to evolve,innovative strategies are emerging to enhance performance.Among them,resistive random access memory(RRAM)has gained significant attention due to its numerousadvantages over traditional memory devices,including high speed(<1 ns),high density(4 F^(2)·n^(-1)),high scalability(~nm),and low power consumption(~pJ).This review focuses on the recent progress of embedded RRAM in industrial manufacturing and its potentialapplications.It provides a brief introduction to the concepts and advantages of RRAM,discusses the key factors that impact its industrial manufacturing,and presents the commercial progress driven by cutting-edge nanotechnology,which has been pursued by manysemiconductor giants.Additionally,it highlights the adoption of embedded RRAM in emerging applications within the realm of the Internet of Things and future intelligent computing,with a particular emphasis on its role in neuromorphic computing.Finally,the review discusses thecurrent challenges and provides insights into the prospects of embedded RRAM in the era of big data and artificial intelligence.
文摘The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.
基金the National Key Research and Development Program of China(No.2017YFC0212100)National High-tech R&D Program of China(No.2015AA015308).
文摘With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-performance computing,require enhanced computing power.To meet this requirement,a non-uniform memory access(NUMA)configuration method is proposed for the cloud computing system according to the affinity,adaptability,and availability of the NUMA architecture processor platform.The proposed method is verified based on the test environment of a domestic central processing unit(CPU).
基金Supported by the National Natural Science Foundation of China under Grant No 51202196the National Aerospace Science Foundation of China under Grant No 2013ZF53067+2 种基金the Natural Science Basic Research Plan in Shaanxi Province of China under Grant No 2014JQ6204the Fundamental Research Funds for the Central Universities under Grant No 3102014JCQ01032the 111 Project under Grant No B08040
文摘Cu/HfOx/n^+Si devices are fabricated to investigate the influence of technological parameters including film thickness and Ar/02 ratio on the resistive switching (RS) characteristics of HfOx films, in terms of switch ratio, endurance properties, retention time and multilevel storage. It is revealed that the RS characteristics show strong dependence on technological parameters mainly by altering the defects (oxygen vacancies) in the film. The sample with thickness of 2Onto and Ar/O2 ratio of 12:3 exhibits the best RS behavior with the potential of multilevel storage. The conduction mechanism of all the films is interpreted based on the filamentary model.
基金supported in part by Thailand Science Research and Innovation(TSRI)and National Research Council of Thailand(NRCT)via International Research Network Program(IRN61W0006)Thailand+1 种基金by Khon Kaen University,ThailandDuy Tan University,Vietnam。
文摘In this paper,we study the system performance of mobile edge computing(MEC)wireless sensor networks(WSNs)using a multiantenna access point(AP)and two sensor clusters based on uplink nonorthogonal multiple access(NOMA).Due to limited computation and energy resources,the cluster heads(CHs)offload their tasks to a multiantenna AP over Nakagami-m fading.We proposed a combination protocol for NOMA-MEC-WSNs in which the AP selects either selection combining(SC)or maximal ratio combining(MRC)and each cluster selects a CH to participate in the communication process by employing the sensor node(SN)selection.We derive the closed-form exact expressions of the successful computation probability(SCP)to evaluate the system performance with the latency and energy consumption constraints of the considered WSN.Numerical results are provided to gain insight into the system performance in terms of the SCP based on system parameters such as the number of AP antennas,number of SNs in each cluster,task length,working frequency,offloading ratio,and transmit power allocation.Furthermore,to determine the optimal resource parameters,i.e.,the offloading ratio,power allocation of the two CHs,and MEC AP resources,we proposed two algorithms to achieve the best system performance.Our approach reveals that the optimal parameters with different schemes significantly improve SCP compared to other similar studies.We use Monte Carlo simulations to confirm the validity of our analysis.
文摘Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.