The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising altern...The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising alternative architecture,enabling computing operations within memory arrays to overcome these limitations.Memristive devices have gained significant attention as key components for in-memory computing due to their high-density arrays,rapid response times,and ability to emulate biological synapses.Among these devices,two-dimensional(2D)material-based memristor and memtransistor arrays have emerged as particularly promising candidates for next-generation in-memory computing,thanks to their exceptional performance driven by the unique properties of 2D materials,such as layered structures,mechanical flexibility,and the capability to form heterojunctions.This review delves into the state-of-the-art research on 2D material-based memristive arrays,encompassing critical aspects such as material selection,device perfor-mance metrics,array structures,and potential applications.Furthermore,it provides a comprehensive overview of the current challenges and limitations associated with these arrays,along with potential solutions.The primary objective of this review is to serve as a significant milestone in realizing next-generation in-memory computing utilizing 2D materials and bridge the gap from single-device characterization to array-level and system-level implementations of neuromorphic computing,leveraging the potential of 2D material-based memristive devices.展开更多
The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bott...The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.展开更多
Ferroelectrics have great potential in the field of nonvolatile memory due to programmable polarization states by external electric field in nonvolatile manner.However,complementary metal oxide semiconductor compatibi...Ferroelectrics have great potential in the field of nonvolatile memory due to programmable polarization states by external electric field in nonvolatile manner.However,complementary metal oxide semiconductor compatibility and uniformity of ferroelectric performance after size scaling have always been two thorny issues hindering practical application of ferroelectric memory devices.The emerging ferroelectricity of wurtzite structure nitride offers opportunities to circumvent the dilemma.This review covers the mechanism of ferroelectricity and domain dynamics in ferroelectric AlScN films.The performance optimization of AlScN films grown by different techniques is summarized and their applications for memories and emerging in-memory computing are illustrated.Finally,the challenges and perspectives regarding the commercial avenue of ferroelectric AlScN are discussed.展开更多
In-memory systems with erasure coding(EC)enabled are widely used to achieve high performance and data availability.However,as the scale of clusters grows,the server-level fail-slow problem is becoming increasingly fre...In-memory systems with erasure coding(EC)enabled are widely used to achieve high performance and data availability.However,as the scale of clusters grows,the server-level fail-slow problem is becoming increasingly frequent,which can create long tail latency.The influence of long tail latency is further amplified in EC-based systems due to the synchronous nature of multiple EC sub-operations.In this paper,we propose an EC-enabled in-memory storage system called ShortTail,which can achieve consistent performance and low latency for both reads and writes.First,ShortTail uses a lightweight request monitor to track the performance of each memory node and identify any fail-slow node.Second,ShortTail selectively performs degraded reads and redirected writes to avoid accessing fail-slow nodes.Finally,ShortTail posts an adaptive write strategy to reduce write amplification of small writes.We implement ShortTail on top of Memcached and compare it with two baseline systems.The experimental results show that ShortTail can reduce the P99 tail latency by up to 63.77%;it also brings significant improvements in the median latency and average latency.展开更多
In this paper, a Distributed In-Memory Database (DIMDB) system is proposed to improve processing efficiency in mass data applications. The system uses an enhanced language similar to Structured Query Language (SQL...In this paper, a Distributed In-Memory Database (DIMDB) system is proposed to improve processing efficiency in mass data applications. The system uses an enhanced language similar to Structured Query Language (SQL) with a key-value storage schema. The design goals of the DIMDB system is described and its system architecture is discussed. Operation flow and the enhanced SOL-like language are also discussed, and experimental results are used to test the validity of the system.展开更多
Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann arc...Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.展开更多
Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with...Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.展开更多
Combining logical function and memory characteristics of transistors is an ideal strategy for enhancing computational efficiency of transistor devices.Here,we rationally design a tri-gate two-dimensional(2D)ferroelect...Combining logical function and memory characteristics of transistors is an ideal strategy for enhancing computational efficiency of transistor devices.Here,we rationally design a tri-gate two-dimensional(2D)ferroelectric van der Waals heterostructures device based on copper indium thiophosphate(CuInP_(2)S_(6))and few layers tungsten disulfide(WS_(2)),and demonstrate its multi-functional applications in multi-valued state of data,non-volatile storage,and logic operation.By co-regulating the input signals across the tri-gate,we show that the device can switch functions flexibly at a low supply voltage of 6 V,giving rise to an ultra-high current switching ratio of 107 and a low subthreshold swing of 53.9 mV/dec.These findings offer perspectives in designing smart 2D devices with excellent functions based on ferroelectric van der Waals heterostructures.展开更多
Memristive stateful logic is one of the most promising candidates to implement an in-memory computing system that computes within the storage unit.It can eliminate the costs for the data movement in the traditional vo...Memristive stateful logic is one of the most promising candidates to implement an in-memory computing system that computes within the storage unit.It can eliminate the costs for the data movement in the traditional von Neumann system.However,the instability in the memristors is inevitable due to the limitation of the current fabrication technology,which incurs a great challenge for the reliability of the memristive stateful logic.In this paper,the implication of device instability on the reliability of the logic event is simulated.The mathematical relationship between logic reliability and redundancy has been deduced.By combining the mathematical relationship with the vector-matrix multiplication in a memristive crossbar array,the logic error correction scheme with high throughput has been proposed.Moreover,a universal design paradigm has been put forward for complex logic.And the circuit schematic and the flow of the scheme have been raised.Finally,a 1-bit full adder(FA)based on the NOR logic and NOT logic is simulated and the mathematical evaluation is performed.It demonstrates the scheme can improve the reliability of the logic significantly.And compared with other four error corrections,the scheme which can be suitable for all kinds of R–R logics and V–R logics has the best universality and throughput.Compared with the other two approaches which also need additional complementary metal–oxide semiconductor(CMOS)circuits,it needs fewer transistors and cycles for the error correction.展开更多
Sparse coding is a prevalent method for image inpainting and feature extraction,which can repair corrupted images or improve data processing efficiency,and has numerous applications in computer vision and signal proce...Sparse coding is a prevalent method for image inpainting and feature extraction,which can repair corrupted images or improve data processing efficiency,and has numerous applications in computer vision and signal processing.Recently,sev-eral memristor-based in-memory computing systems have been proposed to enhance the efficiency of sparse coding remark-ably.However,the variations and low precision of the devices will deteriorate the dictionary,causing inevitable degradation in the accuracy and reliability of the application.In this work,a digital-analog hybrid memristive sparse coding system is pro-posed utilizing a multilevel Pt/Al_(2)O_(3)/AlO_(x)/W memristor,which employs the forward stagewise regression algorithm:The approxi-mate cosine distance calculation is conducted in the analog part to speed up the computation,followed by high-precision coeffi-cient updates performed in the digital portion.We determine that four states of the aforementioned memristor are sufficient for the processing of natural images.Furthermore,through dynamic adjustment of the mapping ratio,the precision require-ment for the digit-to-analog converters can be reduced to 4 bits.Compared to the previous system,our system achieves higher image reconstruction quality of the 38 dB peak-signal-to-noise ratio.Moreover,in the context of image inpainting,images containing 50%missing pixels can be restored with a reconstruction error of 0.0424 root-mean-squared error.展开更多
Efcient cache management plays a vital role in in-memory dataparallel systems,such as Spark,Tez,Storm and HANA.Recent research,notably research on the Least Reference Count(LRC)and Most Reference Distance(MRD)policies...Efcient cache management plays a vital role in in-memory dataparallel systems,such as Spark,Tez,Storm and HANA.Recent research,notably research on the Least Reference Count(LRC)and Most Reference Distance(MRD)policies,has shown that dependency-aware caching management practices that consider the application’s directed acyclic graph(DAG)perform well in Spark.However,these practices ignore the further relationship between RDDs and cached some redundant RDDs with the same child RDDs,which degrades the memory performance.Hence,in memory-constrained situations,systems may encounter a performance bottleneck due to frequent data block replacement.In addition,the prefetch mechanisms in some cache management policies,such as MRD,are hard to trigger.In this paper,we propose a new cache management method called RDE(Redundant Data Eviction)that can fully utilize applications’DAG information to optimize the management result.By considering both RDDs’dependencies and the reference sequence,we effectively evict RDDs with redundant features and perfect the memory for incoming data blocks.Experiments show that RDE improves performance by an average of 55%compared to LRU and by up to 48%and 20%compared to LRC and MRD,respectively.RDE also shows less sensitivity to memory bottlenecks,which means better availability in memory-constrained environments.展开更多
Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffer...Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.展开更多
Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed a...Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical in dexing mechanism and a prototype distributed datastorage system, called HMIBase, which has hierarchical indexes for nonprima ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessortoprocess update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo rithm is better than other cachereplacement strategies.展开更多
Informed decision-making, better communication and faster response to business situation are the key differences between leaders and followers in this competitive global marketplace. A data-driven organization can ana...Informed decision-making, better communication and faster response to business situation are the key differences between leaders and followers in this competitive global marketplace. A data-driven organization can analyze patterns & anomalies to make sense of the current situation and be ready for future opportunities. Organizations no longer have the problem of “lack of data”, but the problem of “actionable data” at the right time to act, direct and influence their business decisions. The data exists in different transactional systems and/or data warehouse systems, which takes significant time to retrieve/ process relevant information and negatively impacts the time window to out-maneuver the competition. To solve the problem of “actionable data”, enterprises can take advantage of the SAP HANA [1] in-memory platform that enables rapid processing and analysis of huge volumes of data in real-time. This paper discusses how SAP HANA virtual data models can be used for on-the-fly analysis of live transactional data to derive insight, perform what-if analysis and execute business transactions in real-time without using persisted aggregates.展开更多
Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction o...Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.展开更多
Operational analytics is all about answering business questions while doing business and supporting business users across the organization, from shop floor users to management and executives. Therefore, business trans...Operational analytics is all about answering business questions while doing business and supporting business users across the organization, from shop floor users to management and executives. Therefore, business transactions and analytics must co-exist together in a single platform to empower business users to drive insights, make decisions, and complete business processes in a single application and using a single source of facts without toggling between multiple applications. Traditionally transactional systems and analytics were maintained separately to improve throughput of the transactional system and that certainly introduced latency in decision making. However, with innovation in the SAP HANA platform, SAP S/4HANA embedded analytics enables business users, business analysts, and management to perform real-time analytics on live transactional data. This paper reviews technical architecture and key components of SAP S/4HANA embedded analytics. This paper reviews technical architecture and key components of SAP S/4HANA embedded analytics.展开更多
In-memory computing is an alternative method to effectively accelerate the massive data-computing tasks of artificial intelligence(AI)and break the memory wall.In this work,we propose a 2T1C DRAM structure for in-memo...In-memory computing is an alternative method to effectively accelerate the massive data-computing tasks of artificial intelligence(AI)and break the memory wall.In this work,we propose a 2T1C DRAM structure for in-memory computing.It integrates a monolayer graphene transistor,a monolayer MoS_(2)transistor,and a capacitor in a two-transistor-onecapacitor(2T1C)configuration.In this structure,the storage node is in a similar position to that of one-transistor-one-capacitor(1T1C)dynamic random-access memory(DRAM),while an additional graphene transistor is used to accomplish the nondestructive readout of the stored information.Furthermore,the ultralow leakage current of the MoS_(2)transistor enables the storage of multi-level voltages on the capacitor with a long retention time.The stored charges can effectually tune the channel conductance of the graphene transistor due to its excellent linearity so that linear analog multiplication can be realized.Because of the almost unlimited cycling endurance of DRAM,our 2T1C DRAM has great potential for in situ training and recognition,which can significantly improve the recognition accuracy of neural networks.展开更多
The Rowhammer bug is a novel micro-architectural security threat, enabling powerful privilege-escalation attacks on various mainstream platforms. It works by actively flipping bits in Dynamic Random Access Memory(DRAM...The Rowhammer bug is a novel micro-architectural security threat, enabling powerful privilege-escalation attacks on various mainstream platforms. It works by actively flipping bits in Dynamic Random Access Memory(DRAM) cells with unprivileged instructions. In order to set up Rowhammer against binaries in the Linux page cache, the Waylaying algorithm has previously been proposed. The Waylaying method stealthily relocates binaries onto exploitable physical addresses without exhausting system memory. However, the proof-of-concept Waylaying algorithm can be easily detected during page cache eviction because of its high disk I/O overhead and long running time. This paper proposes the more advanced Memway algorithm, which improves on Waylaying in terms of both I/O overhead and speed. Running time and disk I/O overhead are reduced by 90% by utilizing Linux tmpfs and inmemory swapping to manage eviction files. Furthermore, by combining Memway with the unprivileged posix fadvise API, the binary relocation step is made 100 times faster. Equipped with our Memway+fadvise relocation scheme,we demonstrate practical Rowhammer attacks that take only 15–200 minutes to covertly relocate a victim binary,and less than 3 seconds to flip the target instruction bit.展开更多
With the rapid growth of computer science and big data,the traditional von Neumann architecture suffers the aggravating data communication costs due to the separated structure of the processing units and memories.Memr...With the rapid growth of computer science and big data,the traditional von Neumann architecture suffers the aggravating data communication costs due to the separated structure of the processing units and memories.Memristive in-memory computing paradigm is considered as a prominent candidate to address these issues,and plentiful applications have been demonstrated and verified.These applications can be broadly categorized into two major types:soft computing that can tolerant uncertain and imprecise results,and hard computing that emphasizes explicit and precise numerical results for each task,leading to different requirements on the computational accuracies and the corresponding hardware solutions.In this review,we conduct a thorough survey of the recent advances of memristive in-memory computing applications,both on the soft computing type that focuses on artificial neural networks and other machine learning algorithms,and the hard computing type that includes scientific computing and digital image processing.At the end of the review,we discuss the remaining challenges and future opportunities of memristive in-memory computing in the incoming Artificial Intelligence of Things era.展开更多
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are lik...Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.展开更多
基金This work was supported by the National Research Foundation,Singapore under Award No.NRF-CRP24-2020-0002.
文摘The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising alternative architecture,enabling computing operations within memory arrays to overcome these limitations.Memristive devices have gained significant attention as key components for in-memory computing due to their high-density arrays,rapid response times,and ability to emulate biological synapses.Among these devices,two-dimensional(2D)material-based memristor and memtransistor arrays have emerged as particularly promising candidates for next-generation in-memory computing,thanks to their exceptional performance driven by the unique properties of 2D materials,such as layered structures,mechanical flexibility,and the capability to form heterojunctions.This review delves into the state-of-the-art research on 2D material-based memristive arrays,encompassing critical aspects such as material selection,device perfor-mance metrics,array structures,and potential applications.Furthermore,it provides a comprehensive overview of the current challenges and limitations associated with these arrays,along with potential solutions.The primary objective of this review is to serve as a significant milestone in realizing next-generation in-memory computing utilizing 2D materials and bridge the gap from single-device characterization to array-level and system-level implementations of neuromorphic computing,leveraging the potential of 2D material-based memristive devices.
基金supported by the National Natural Science Foundation of China(Nos.62034006,91964105,61874068)the China Key Research and Development Program(No.2016YFA0201802)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2020JQ28)Program of Qilu Young Scholars of Shandong University。
文摘The“memory wall”of traditional von Neumann computing systems severely restricts the efficiency of data-intensive task execution,while in-memory computing(IMC)architecture is a promising approach to breaking the bottleneck.Although variations and instability in ultra-scaled memory cells seriously degrade the calculation accuracy in IMC architectures,stochastic computing(SC)can compensate for these shortcomings due to its low sensitivity to cell disturbances.Furthermore,massive parallel computing can be processed to improve the speed and efficiency of the system.In this paper,by designing logic functions in NOR flash arrays,SC in IMC for the image edge detection is realized,demonstrating ultra-low computational complexity and power consumption(25.5 fJ/pixel at 2-bit sequence length).More impressively,the noise immunity is 6 times higher than that of the traditional binary method,showing good tolerances to cell variation and reliability degradation when implementing massive parallel computation in the array.
基金fundings of National Natural Science Foundation of China(No.T2222025,62174053 and 61804055)National Key Research and Development program of China(No.2021YFA1200700)+1 种基金Shanghai Science and Technology Innovation Action Plan(No.21JC1402000 and 21520714100)the Fundamental Research Funds for the Central Universities.
文摘Ferroelectrics have great potential in the field of nonvolatile memory due to programmable polarization states by external electric field in nonvolatile manner.However,complementary metal oxide semiconductor compatibility and uniformity of ferroelectric performance after size scaling have always been two thorny issues hindering practical application of ferroelectric memory devices.The emerging ferroelectricity of wurtzite structure nitride offers opportunities to circumvent the dilemma.This review covers the mechanism of ferroelectricity and domain dynamics in ferroelectric AlScN films.The performance optimization of AlScN films grown by different techniques is summarized and their applications for memories and emerging in-memory computing are illustrated.Finally,the challenges and perspectives regarding the commercial avenue of ferroelectric AlScN are discussed.
基金supported by the National Natural Science Foundation of China(No.62025203)the Changchun Key Scientific and Technological Research and Development Project,China(No.21ZGN30)。
文摘In-memory systems with erasure coding(EC)enabled are widely used to achieve high performance and data availability.However,as the scale of clusters grows,the server-level fail-slow problem is becoming increasingly frequent,which can create long tail latency.The influence of long tail latency is further amplified in EC-based systems due to the synchronous nature of multiple EC sub-operations.In this paper,we propose an EC-enabled in-memory storage system called ShortTail,which can achieve consistent performance and low latency for both reads and writes.First,ShortTail uses a lightweight request monitor to track the performance of each memory node and identify any fail-slow node.Second,ShortTail selectively performs degraded reads and redirected writes to avoid accessing fail-slow nodes.Finally,ShortTail posts an adaptive write strategy to reduce write amplification of small writes.We implement ShortTail on top of Memcached and compare it with two baseline systems.The experimental results show that ShortTail can reduce the P99 tail latency by up to 63.77%;it also brings significant improvements in the median latency and average latency.
文摘In this paper, a Distributed In-Memory Database (DIMDB) system is proposed to improve processing efficiency in mass data applications. The system uses an enhanced language similar to Structured Query Language (SQL) with a key-value storage schema. The design goals of the DIMDB system is described and its system architecture is discussed. Operation flow and the enhanced SOL-like language are also discussed, and experimental results are used to test the validity of the system.
基金the National Key Research and Development Program of China(2018YFB2202602)The State Key Program of the National Natural Science Foundation of China(NO.61934005)+1 种基金The National Natural Science Foundation of China(NO.62074001)Joint Funds of the National Natural Science Foundation of China under Grant U19A2074.
文摘Artificial intelligence(AI)processes data-centric applications with minimal effort.However,it poses new challenges to system design in terms of computational speed and energy efficiency.The traditional von Neumann architecture cannot meet the requirements of heavily datacentric applications due to the separation of computation and storage.The emergence of computing inmemory(CIM)is significant in circumventing the von Neumann bottleneck.A commercialized memory architecture,static random-access memory(SRAM),is fast and robust,consumes less power,and is compatible with state-of-the-art technology.This study investigates the research progress of SRAM-based CIM technology in three levels:circuit,function,and application.It also outlines the problems,challenges,and prospects of SRAM-based CIM macros.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61925402 and 61851402)Science and Technology Commission of Shanghai Municipality,China(Grant No.19JC1416600)+1 种基金the National Key Research and Development Program of China(Grant No.2017YFB0405600)Shanghai Education Development Foundation and Shanghai Municipal Education Commission Shuguang Program,China(Grant No.18SG01).
文摘Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.
基金supported by the National Natural Science Foundation of China(No.62104073)the China Postdoctoral Science Foundation(No.2021M691088)+1 种基金the Pearl River Talent Recruitment Program(No.2019ZT08X639)Z.C.W.acknowledges the European Research Executive Agency(Project 101079184-FUNLAYERS).
文摘Combining logical function and memory characteristics of transistors is an ideal strategy for enhancing computational efficiency of transistor devices.Here,we rationally design a tri-gate two-dimensional(2D)ferroelectric van der Waals heterostructures device based on copper indium thiophosphate(CuInP_(2)S_(6))and few layers tungsten disulfide(WS_(2)),and demonstrate its multi-functional applications in multi-valued state of data,non-volatile storage,and logic operation.By co-regulating the input signals across the tri-gate,we show that the device can switch functions flexibly at a low supply voltage of 6 V,giving rise to an ultra-high current switching ratio of 107 and a low subthreshold swing of 53.9 mV/dec.These findings offer perspectives in designing smart 2D devices with excellent functions based on ferroelectric van der Waals heterostructures.
基金Project supported by the National Key Research and Development Plan of the Ministry of Science of Technology of China (Grand Nos.2019YFB 2205100 and 2019YFB2205102)the National Natural Science Foundation of China (Grant Nos.61974164,62074166,61804181,62004219,and 62004220)the Science Support Program of the National University of Defense and Technology (Grand No.ZK20-06)。
文摘Memristive stateful logic is one of the most promising candidates to implement an in-memory computing system that computes within the storage unit.It can eliminate the costs for the data movement in the traditional von Neumann system.However,the instability in the memristors is inevitable due to the limitation of the current fabrication technology,which incurs a great challenge for the reliability of the memristive stateful logic.In this paper,the implication of device instability on the reliability of the logic event is simulated.The mathematical relationship between logic reliability and redundancy has been deduced.By combining the mathematical relationship with the vector-matrix multiplication in a memristive crossbar array,the logic error correction scheme with high throughput has been proposed.Moreover,a universal design paradigm has been put forward for complex logic.And the circuit schematic and the flow of the scheme have been raised.Finally,a 1-bit full adder(FA)based on the NOR logic and NOT logic is simulated and the mathematical evaluation is performed.It demonstrates the scheme can improve the reliability of the logic significantly.And compared with other four error corrections,the scheme which can be suitable for all kinds of R–R logics and V–R logics has the best universality and throughput.Compared with the other two approaches which also need additional complementary metal–oxide semiconductor(CMOS)circuits,it needs fewer transistors and cycles for the error correction.
基金This work was supported by the National Key R&D Program of China(Grant No.2019YFB2205100)in part by Hubei Key Laboratory of Advanced Memories.
文摘Sparse coding is a prevalent method for image inpainting and feature extraction,which can repair corrupted images or improve data processing efficiency,and has numerous applications in computer vision and signal processing.Recently,sev-eral memristor-based in-memory computing systems have been proposed to enhance the efficiency of sparse coding remark-ably.However,the variations and low precision of the devices will deteriorate the dictionary,causing inevitable degradation in the accuracy and reliability of the application.In this work,a digital-analog hybrid memristive sparse coding system is pro-posed utilizing a multilevel Pt/Al_(2)O_(3)/AlO_(x)/W memristor,which employs the forward stagewise regression algorithm:The approxi-mate cosine distance calculation is conducted in the analog part to speed up the computation,followed by high-precision coeffi-cient updates performed in the digital portion.We determine that four states of the aforementioned memristor are sufficient for the processing of natural images.Furthermore,through dynamic adjustment of the mapping ratio,the precision require-ment for the digit-to-analog converters can be reduced to 4 bits.Compared to the previous system,our system achieves higher image reconstruction quality of the 38 dB peak-signal-to-noise ratio.Moreover,in the context of image inpainting,images containing 50%missing pixels can be restored with a reconstruction error of 0.0424 root-mean-squared error.
基金supported by the National Natural Science Foundation of China under Grant 6110002。
文摘Efcient cache management plays a vital role in in-memory dataparallel systems,such as Spark,Tez,Storm and HANA.Recent research,notably research on the Least Reference Count(LRC)and Most Reference Distance(MRD)policies,has shown that dependency-aware caching management practices that consider the application’s directed acyclic graph(DAG)perform well in Spark.However,these practices ignore the further relationship between RDDs and cached some redundant RDDs with the same child RDDs,which degrades the memory performance.Hence,in memory-constrained situations,systems may encounter a performance bottleneck due to frequent data block replacement.In addition,the prefetch mechanisms in some cache management policies,such as MRD,are hard to trigger.In this paper,we propose a new cache management method called RDE(Redundant Data Eviction)that can fully utilize applications’DAG information to optimize the management result.By considering both RDDs’dependencies and the reference sequence,we effectively evict RDDs with redundant features and perfect the memory for incoming data blocks.Experiments show that RDE improves performance by an average of 55%compared to LRU and by up to 48%and 20%compared to LRC and MRD,respectively.RDE also shows less sensitivity to memory bottlenecks,which means better availability in memory-constrained environments.
基金National Key Research and Development Plan of MOST of China,Grant/Award Numbers:2019YFB2205100,2021ZD0201201National Natural Science Foundation of China,Grant/Award Number:92064012+1 种基金Hubei Engineering Research Center on MicroelectronicsChua Memristor Institute。
文摘Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.
基金supported by China National Science Foundation(Grant 61223003)ZTE Industry-Academia-Research Cooperation Funds
文摘Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical in dexing mechanism and a prototype distributed datastorage system, called HMIBase, which has hierarchical indexes for nonprima ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessortoprocess update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo rithm is better than other cachereplacement strategies.
文摘Informed decision-making, better communication and faster response to business situation are the key differences between leaders and followers in this competitive global marketplace. A data-driven organization can analyze patterns & anomalies to make sense of the current situation and be ready for future opportunities. Organizations no longer have the problem of “lack of data”, but the problem of “actionable data” at the right time to act, direct and influence their business decisions. The data exists in different transactional systems and/or data warehouse systems, which takes significant time to retrieve/ process relevant information and negatively impacts the time window to out-maneuver the competition. To solve the problem of “actionable data”, enterprises can take advantage of the SAP HANA [1] in-memory platform that enables rapid processing and analysis of huge volumes of data in real-time. This paper discusses how SAP HANA virtual data models can be used for on-the-fly analysis of live transactional data to derive insight, perform what-if analysis and execute business transactions in real-time without using persisted aggregates.
文摘Digital transformation has been corner stone of business innovation in the last decade, and these innovations have dramatically changed the definition and boundaries of enterprise business applications. Introduction of new products/ services, version management of existing products/ services, management of customer/partner connections, management of multi-channel service delivery (web, social media, web etc.), merger/acquisitions of new businesses and adoption of new innovations/technologies will drive data growth in business applications. These datasets exist in different sharing nothing business applications at different locations and in various forms. So, to make sense of this information and derive insight, it is essential to break the data silos, streamline data retrieval and simplify information access across the entire organization. The information access framework must support just-in-time processing capabilities to bring data from multiple sources, be fast and powerful enough to transform and process huge amounts of data quickly, and be agile enough to accommodate new data sources per user needs. This paper discusses the SAP HANA Smart Data Access data-virtualization technology to enable unified access to heterogenous data across the organization and analysis of huge volume of data in real-time using SAP HANA in-memory platform.
文摘Operational analytics is all about answering business questions while doing business and supporting business users across the organization, from shop floor users to management and executives. Therefore, business transactions and analytics must co-exist together in a single platform to empower business users to drive insights, make decisions, and complete business processes in a single application and using a single source of facts without toggling between multiple applications. Traditionally transactional systems and analytics were maintained separately to improve throughput of the transactional system and that certainly introduced latency in decision making. However, with innovation in the SAP HANA platform, SAP S/4HANA embedded analytics enables business users, business analysts, and management to perform real-time analytics on live transactional data. This paper reviews technical architecture and key components of SAP S/4HANA embedded analytics. This paper reviews technical architecture and key components of SAP S/4HANA embedded analytics.
基金This work was supported by the National Key Research and Development Program(2021YFA1200500)in part by the Innovation Program of Shanghai Municipal Education Commission(2021-01-07-00-07-E00077)Shanghai Municipal Science and Technology Commission(21DZ1100900).
文摘In-memory computing is an alternative method to effectively accelerate the massive data-computing tasks of artificial intelligence(AI)and break the memory wall.In this work,we propose a 2T1C DRAM structure for in-memory computing.It integrates a monolayer graphene transistor,a monolayer MoS_(2)transistor,and a capacitor in a two-transistor-onecapacitor(2T1C)configuration.In this structure,the storage node is in a similar position to that of one-transistor-one-capacitor(1T1C)dynamic random-access memory(DRAM),while an additional graphene transistor is used to accomplish the nondestructive readout of the stored information.Furthermore,the ultralow leakage current of the MoS_(2)transistor enables the storage of multi-level voltages on the capacitor with a long retention time.The stored charges can effectually tune the channel conductance of the graphene transistor due to its excellent linearity so that linear analog multiplication can be realized.Because of the almost unlimited cycling endurance of DRAM,our 2T1C DRAM has great potential for in situ training and recognition,which can significantly improve the recognition accuracy of neural networks.
基金supported by the National Natural Science Foundation of China(Nos.U1836112,U1536204,and 61876134)the Fundamental Research Funds for the Central Universities(No.2042018kf10281)+1 种基金Foundation of Key Lab of Information Assurance and Technology(No.KJ-17-101)China Scholarship Council
文摘The Rowhammer bug is a novel micro-architectural security threat, enabling powerful privilege-escalation attacks on various mainstream platforms. It works by actively flipping bits in Dynamic Random Access Memory(DRAM) cells with unprivileged instructions. In order to set up Rowhammer against binaries in the Linux page cache, the Waylaying algorithm has previously been proposed. The Waylaying method stealthily relocates binaries onto exploitable physical addresses without exhausting system memory. However, the proof-of-concept Waylaying algorithm can be easily detected during page cache eviction because of its high disk I/O overhead and long running time. This paper proposes the more advanced Memway algorithm, which improves on Waylaying in terms of both I/O overhead and speed. Running time and disk I/O overhead are reduced by 90% by utilizing Linux tmpfs and inmemory swapping to manage eviction files. Furthermore, by combining Memway with the unprivileged posix fadvise API, the binary relocation step is made 100 times faster. Equipped with our Memway+fadvise relocation scheme,we demonstrate practical Rowhammer attacks that take only 15–200 minutes to covertly relocate a victim binary,and less than 3 seconds to flip the target instruction bit.
基金This work was financially supported by the National Key R&D Program of China(Nos.2019YFB2205100 and 2021ZD0201201)the National Natural Science Foundation of China(Grant Nos.92064012 and 61874164).
文摘With the rapid growth of computer science and big data,the traditional von Neumann architecture suffers the aggravating data communication costs due to the separated structure of the processing units and memories.Memristive in-memory computing paradigm is considered as a prominent candidate to address these issues,and plentiful applications have been demonstrated and verified.These applications can be broadly categorized into two major types:soft computing that can tolerant uncertain and imprecise results,and hard computing that emphasizes explicit and precise numerical results for each task,leading to different requirements on the computational accuracies and the corresponding hardware solutions.In this review,we conduct a thorough survey of the recent advances of memristive in-memory computing applications,both on the soft computing type that focuses on artificial neural networks and other machine learning algorithms,and the hard computing type that includes scientific computing and digital image processing.At the end of the review,we discuss the remaining challenges and future opportunities of memristive in-memory computing in the incoming Artificial Intelligence of Things era.
基金Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0203904the National Natural Science Foundation of China under Grant Nos. 61872392, U1811461 and 61832020+4 种基金the Pearl River Science and Technology Nova Program of Guangzhou under Grant No. 201906010008Guangdong Natural Science Foundation under Grant No. 2018B030312002the Major Program of Guangdong Basic and Applied Research under Grant No. 2019B030302002the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant No. 2016ZT06D211the Key-Area Research and Development Program of Guang Dong Province of China under Grant No. 2019B010107001.
文摘Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.