Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc...Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.展开更多
As a result of the interplay between advances in computer hardware, software, and algorithm, we are now in a new era of large-scale reservoir simulation, which focuses on accurate flow description, fine reservoir char...As a result of the interplay between advances in computer hardware, software, and algorithm, we are now in a new era of large-scale reservoir simulation, which focuses on accurate flow description, fine reservoir characterization, efficient nonlinear/linear solvers, and parallel implementation. In this paper, we discuss a multilevel preconditioner in a new-generation simulator and its implementation on multicore computers. This preconditioner relies on the method of subspace corrections to solve large-scale linear systems arising from fully implicit methods in reservoir simulations. We investigate the parallel efficiency and robustness of the proposed method by applying it to million-cell benchmark problems.展开更多
Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a lear...Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a learning-based visual navigation pipeline that uses topological maps as memory configurations.We introduce a unique online topology construction approach that fuses odometry pose estimation and perceptual similarity estimation.This tackles the issues of topological node redundancy and incorrect edge connections,which stem from the distribution gap between the spatial and perceptual domains.Furthermore,we propose a differentiable graph extraction structure,the topology multi-factor transformer(TMFT).This structure utilizes graph neural networks to integrate global memory and incorporates a multi-factor attention mechanism to underscore elements closely related to relevant target cues for policy generation.Results from photorealistic simulations on image-goal navigation tasks highlight the superior navigation performance of our proposed pipeline compared to existing memory structures.Comprehensive validation through behavior visualization,interpretability tests,and real-world deployment further underscore the adapt-ability and efficacy of our method.展开更多
The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algori...The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.展开更多
Nonlinear multisplitting method is known as parallel iterative methods for solving a large-scale system of nonlinear equations F(x) = 0. We extend the idea of nonlinear multisplitting and consider a new model ill whic...Nonlinear multisplitting method is known as parallel iterative methods for solving a large-scale system of nonlinear equations F(x) = 0. We extend the idea of nonlinear multisplitting and consider a new model ill which the iteration is executed asynchronously: Each processor calculate the solution of an individual nonlinear system belong to its nonlinear multisplitting and can update the global approximation residing in the shared memory at any time. A local convergence analysis of this model is presented. Finally, we give a uumerical example which shows a 'strange' property that speedup Sp > p and efficiency Ep > 1.展开更多
Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, patte...Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an iterative fashion. Although k-means is considered to be a poor clustering algorithm in terms of result quality, due to its simplicity, speed on practical applications, and iterative nature it is selected as one of the top 10 algorithms in data mining [1]. Parallelization of k-means is also studied during the last 2 decades. Most of these work concentrate on shared-nothing architectures. With the advent of current technological advances on GPU technology, implementation of the k-means algorithm on shared memory architectures recently start to attract some attention. However, to the best of our knowledge, no in-depth analysis on the performance of k-means on shared memory multiprocessors is done in the literature. In this work, our aim is to fill this gap by providing theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.展开更多
In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the perform...In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the performance of processors and imposes a limitation of the number of processors that can be used efficiently in bus-based systems. Since the multi-processor’s performance depends upon many parameters which affect the performance in different ways, timed Petri nets are used to model shared-memory bus-based multiprocessors at the instruction execution level, and the developed models are used to study how the performance of processors changes with the number of processors in the system. The results illustrate very well the restriction on the number of processors imposed by the shared bus. All performance characteristics presented in this paper are obtained by discrete-event simulation of Petri net models.展开更多
The rapid development of urbanization leads to the transformation of urban industrial structure,resulting in prominent issues such as factory relocation and abandoned railway.As a city memory,abandoned railway implies...The rapid development of urbanization leads to the transformation of urban industrial structure,resulting in prominent issues such as factory relocation and abandoned railway.As a city memory,abandoned railway implies rich material and spiritual connotations,and it comes to be a trend that abandoned railway is transformed into urban landscape.By illustrating the concept of urban memory and sharing,the paper divides the types of shared railway landscape from the perspective of urban memory,and analyzes relevant cases at home and abroad.On this basis,the design strategy of shared railway landscape is summarized to better awaken the memory of the city,to stimulate the emotion of citizens,and to continue the history and culture of the city.展开更多
A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is use...A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is used to promote loss and delay performance. A new queueing analytical model is developed based on a sub-timeslot approach. The system performance in terms of cell loss and delay is analyzed and verified by simulation.展开更多
Shared Memory (SM) switches are widely used for its high throughput, low delay and efficient use of memory. This paper compares the performance of two prominent switching schemes of SM packet switches: Cell-Based Swit...Shared Memory (SM) switches are widely used for its high throughput, low delay and efficient use of memory. This paper compares the performance of two prominent switching schemes of SM packet switches: Cell-Based Switching (CBS) and Packet-Based Switching (PBS).Theoretical analysis is carried out to draw qualitative conclusion on the memory requirement,throughput and packet delay of the two schemes. Furthermore, simulations are carried out to get quantitative results of the performance comparison under various system load, traffic patterns,and memory sizes. Simulation results show that PBS has the advantage of shorter time delay while CBS has lower memory requirement and outperforms in throughput when the memory size is limited. The comparison can be used for tradeoff between performance and complexity in switch design.展开更多
Aspect’s extraction is a critical task in aspect-based sentiment analysis,including explicit and implicit aspects identification.While extensive research has identified explicit aspects,little effort has been put for...Aspect’s extraction is a critical task in aspect-based sentiment analysis,including explicit and implicit aspects identification.While extensive research has identified explicit aspects,little effort has been put forward on implicit aspects extraction due to the complexity of the problem.Moreover,existing research on implicit aspect identification is widely carried out on product reviews targeting specific aspects while neglecting sentences’dependency problems.Therefore,in this paper,a multi-level knowledge engineering approach for identifying implicit movie aspects is proposed.The proposed method first identifies explicit aspects using a variant of BiLSTM and CRF(Bidirectional Long Short Memory-Conditional Random Field),which serve as a memory to process dependent sentences to infer implicit aspects.It can identify implicit aspects from four types of sentences,including independent and three types of dependent sentences.The study is evaluated on a largemovie reviews dataset with 50k examples.The experimental results showed that the explicit aspect identification method achieved 89%F1-score and implicit aspect extraction methods achieved 76%F1-score.In addition,the proposed approach also performs better than the state-of-the-art techniques(NMFIAD andML-KB+)on the product review dataset,where it achieved 93%precision,92%recall,and 93%F1-score.展开更多
Currently,the mainstream vector network analyzer employs embedded computer module with a digital intermediate frequency(IF)board to form a high performance windows platform.Under this structure,the vector network anal...Currently,the mainstream vector network analyzer employs embedded computer module with a digital intermediate frequency(IF)board to form a high performance windows platform.Under this structure,the vector network analyzer needs a powerful encoding system to arbitrate the bus acquirement,which is usually realized by field-programmable gate array(FPGA)chip.The paper explores the shared bus design method of the digital signal processing(DSP)board in network analyzer.Firsty,it puts an emphasis on the system structure,and then the shared bus communication method is described in detail;Finally,the advantages of the shared bus communication mechanism are summanzed.展开更多
This paper proposes an associative memory model based on a coupled system of Gaussian maps. A one-dimensional Gaussian map describes a discrete-time dynamical system, and the coupled system of Gaussian maps can genera...This paper proposes an associative memory model based on a coupled system of Gaussian maps. A one-dimensional Gaussian map describes a discrete-time dynamical system, and the coupled system of Gaussian maps can generate various phenomena including asymmetric fixed and periodic points. The Gaussian associative memory can effectively recall one of the stored patterns, which were triggered by an input pattern by associating the asymmetric two-periodic points observed in the coupled system with the binary values of output patterns. To investigate the Gaussian associative memory model, we formed its reduced model and analyzed the bifurcation structure. Pseudo-patterns were observed for the proposed model along with other conventional associative memory models, and the obtained patterns were related to the high-order or quasi-periodic points and the chaotic trajectories. In this paper, the structure of the Gaussian associative memory and its reduced models are introduced as well as the results of the bifurcation analysis are presented. Furthermore, the output sequences obtained from simulation of the recalling process are presented. We discuss the mechanism and the characteristics of the Gaussian associative memory based on the results of the analysis and the simulations conducted.展开更多
Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of co...Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure.展开更多
Virtualization is the backbone of cloud computing,which is a developing and widely used paradigm.Byfinding and merging identical memory pages,memory deduplication improves memory efficiency in virtualized systems.Kern...Virtualization is the backbone of cloud computing,which is a developing and widely used paradigm.Byfinding and merging identical memory pages,memory deduplication improves memory efficiency in virtualized systems.Kernel Same Page Merging(KSM)is a Linux service for memory pages sharing in virtualized environments.Memory deduplication is vulnerable to a memory disclosure attack,which uses covert channel establishment to reveal the contents of other colocated virtual machines.To avoid a memory disclosure attack,sharing of identical pages within a single user’s virtual machine is permitted,but sharing of contents between different users is forbidden.In our proposed approach,virtual machines with similar operating systems of active domains in a node are recognised and organised into a homogenous batch,with memory deduplication performed inside that batch,to improve the memory pages sharing efficiency.When compared to memory deduplication applied to the entire host,implementation details demonstrate a significant increase in the number of pages shared when memory deduplication applied batch-wise and CPU(Central processing unit)consumption also increased.展开更多
文摘Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.
基金support through PetroChina New-generation Reservoir Simulation Software (2011A-1010)the Program of Research on Continental Sedimentary Oil Reservoir Simulation (z121100004912001)+7 种基金founded by Beijing Municipal Science & Technology Commission and PetroChina Joint Research Funding12HT1050002654partially supported by the NSFC Grant 11201398Hunan Provincial Natural Science Foundation of China Grant 14JJ2063Specialized Research Fund for the Doctoral Program of Higher Education of China Grant 20124301110003partially supported by the Dean’s Startup Fund, Academy of Mathematics and System Sciences and the State High Tech Development Plan of China (863 Program 2012AA01A309partially supported by NSFC Grant 91130002Program for Changjiang Scholars and Innovative Research Team in University of China Grant IRT1179the Scientific Research Fund of the Hunan Provincial Education Department of China Grant 12A138
文摘As a result of the interplay between advances in computer hardware, software, and algorithm, we are now in a new era of large-scale reservoir simulation, which focuses on accurate flow description, fine reservoir characterization, efficient nonlinear/linear solvers, and parallel implementation. In this paper, we discuss a multilevel preconditioner in a new-generation simulator and its implementation on multicore computers. This preconditioner relies on the method of subspace corrections to solve large-scale linear systems arising from fully implicit methods in reservoir simulations. We investigate the parallel efficiency and robustness of the proposed method by applying it to million-cell benchmark problems.
基金supported in part by the National Natural Science Foundation of China (62225309,62073222,U21A20480,62361166632)。
文摘Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a learning-based visual navigation pipeline that uses topological maps as memory configurations.We introduce a unique online topology construction approach that fuses odometry pose estimation and perceptual similarity estimation.This tackles the issues of topological node redundancy and incorrect edge connections,which stem from the distribution gap between the spatial and perceptual domains.Furthermore,we propose a differentiable graph extraction structure,the topology multi-factor transformer(TMFT).This structure utilizes graph neural networks to integrate global memory and incorporates a multi-factor attention mechanism to underscore elements closely related to relevant target cues for policy generation.Results from photorealistic simulations on image-goal navigation tasks highlight the superior navigation performance of our proposed pipeline compared to existing memory structures.Comprehensive validation through behavior visualization,interpretability tests,and real-world deployment further underscore the adapt-ability and efficacy of our method.
文摘The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.
文摘Nonlinear multisplitting method is known as parallel iterative methods for solving a large-scale system of nonlinear equations F(x) = 0. We extend the idea of nonlinear multisplitting and consider a new model ill which the iteration is executed asynchronously: Each processor calculate the solution of an individual nonlinear system belong to its nonlinear multisplitting and can update the global approximation residing in the shared memory at any time. A local convergence analysis of this model is presented. Finally, we give a uumerical example which shows a 'strange' property that speedup Sp > p and efficiency Ep > 1.
文摘Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an iterative fashion. Although k-means is considered to be a poor clustering algorithm in terms of result quality, due to its simplicity, speed on practical applications, and iterative nature it is selected as one of the top 10 algorithms in data mining [1]. Parallelization of k-means is also studied during the last 2 decades. Most of these work concentrate on shared-nothing architectures. With the advent of current technological advances on GPU technology, implementation of the k-means algorithm on shared memory architectures recently start to attract some attention. However, to the best of our knowledge, no in-depth analysis on the performance of k-means on shared memory multiprocessors is done in the literature. In this work, our aim is to fill this gap by providing theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.
文摘In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the performance of processors and imposes a limitation of the number of processors that can be used efficiently in bus-based systems. Since the multi-processor’s performance depends upon many parameters which affect the performance in different ways, timed Petri nets are used to model shared-memory bus-based multiprocessors at the instruction execution level, and the developed models are used to study how the performance of processors changes with the number of processors in the system. The results illustrate very well the restriction on the number of processors imposed by the shared bus. All performance characteristics presented in this paper are obtained by discrete-event simulation of Petri net models.
文摘The rapid development of urbanization leads to the transformation of urban industrial structure,resulting in prominent issues such as factory relocation and abandoned railway.As a city memory,abandoned railway implies rich material and spiritual connotations,and it comes to be a trend that abandoned railway is transformed into urban landscape.By illustrating the concept of urban memory and sharing,the paper divides the types of shared railway landscape from the perspective of urban memory,and analyzes relevant cases at home and abroad.On this basis,the design strategy of shared railway landscape is summarized to better awaken the memory of the city,to stimulate the emotion of citizens,and to continue the history and culture of the city.
文摘A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is used to promote loss and delay performance. A new queueing analytical model is developed based on a sub-timeslot approach. The system performance in terms of cell loss and delay is analyzed and verified by simulation.
基金Supported by the National Natural Science Foundation of China(No.69896242).
文摘Shared Memory (SM) switches are widely used for its high throughput, low delay and efficient use of memory. This paper compares the performance of two prominent switching schemes of SM packet switches: Cell-Based Switching (CBS) and Packet-Based Switching (PBS).Theoretical analysis is carried out to draw qualitative conclusion on the memory requirement,throughput and packet delay of the two schemes. Furthermore, simulations are carried out to get quantitative results of the performance comparison under various system load, traffic patterns,and memory sizes. Simulation results show that PBS has the advantage of shorter time delay while CBS has lower memory requirement and outperforms in throughput when the memory size is limited. The comparison can be used for tradeoff between performance and complexity in switch design.
文摘Aspect’s extraction is a critical task in aspect-based sentiment analysis,including explicit and implicit aspects identification.While extensive research has identified explicit aspects,little effort has been put forward on implicit aspects extraction due to the complexity of the problem.Moreover,existing research on implicit aspect identification is widely carried out on product reviews targeting specific aspects while neglecting sentences’dependency problems.Therefore,in this paper,a multi-level knowledge engineering approach for identifying implicit movie aspects is proposed.The proposed method first identifies explicit aspects using a variant of BiLSTM and CRF(Bidirectional Long Short Memory-Conditional Random Field),which serve as a memory to process dependent sentences to infer implicit aspects.It can identify implicit aspects from four types of sentences,including independent and three types of dependent sentences.The study is evaluated on a largemovie reviews dataset with 50k examples.The experimental results showed that the explicit aspect identification method achieved 89%F1-score and implicit aspect extraction methods achieved 76%F1-score.In addition,the proposed approach also performs better than the state-of-the-art techniques(NMFIAD andML-KB+)on the product review dataset,where it achieved 93%precision,92%recall,and 93%F1-score.
文摘Currently,the mainstream vector network analyzer employs embedded computer module with a digital intermediate frequency(IF)board to form a high performance windows platform.Under this structure,the vector network analyzer needs a powerful encoding system to arbitrate the bus acquirement,which is usually realized by field-programmable gate array(FPGA)chip.The paper explores the shared bus design method of the digital signal processing(DSP)board in network analyzer.Firsty,it puts an emphasis on the system structure,and then the shared bus communication method is described in detail;Finally,the advantages of the shared bus communication mechanism are summanzed.
文摘This paper proposes an associative memory model based on a coupled system of Gaussian maps. A one-dimensional Gaussian map describes a discrete-time dynamical system, and the coupled system of Gaussian maps can generate various phenomena including asymmetric fixed and periodic points. The Gaussian associative memory can effectively recall one of the stored patterns, which were triggered by an input pattern by associating the asymmetric two-periodic points observed in the coupled system with the binary values of output patterns. To investigate the Gaussian associative memory model, we formed its reduced model and analyzed the bifurcation structure. Pseudo-patterns were observed for the proposed model along with other conventional associative memory models, and the obtained patterns were related to the high-order or quasi-periodic points and the chaotic trajectories. In this paper, the structure of the Gaussian associative memory and its reduced models are introduced as well as the results of the bifurcation analysis are presented. Furthermore, the output sequences obtained from simulation of the recalling process are presented. We discuss the mechanism and the characteristics of the Gaussian associative memory based on the results of the analysis and the simulations conducted.
基金Supported by the National Natural Science Foundation of China(No.61305042,61202098)Projects of Center for Remote Sensing Mission Study of China National Space Administration(No.2012A03A0939)Science and Technological Research of Key Projects of Education Department of Henan Province of China(No.13A520071)
文摘Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure.
文摘Virtualization is the backbone of cloud computing,which is a developing and widely used paradigm.Byfinding and merging identical memory pages,memory deduplication improves memory efficiency in virtualized systems.Kernel Same Page Merging(KSM)is a Linux service for memory pages sharing in virtualized environments.Memory deduplication is vulnerable to a memory disclosure attack,which uses covert channel establishment to reveal the contents of other colocated virtual machines.To avoid a memory disclosure attack,sharing of identical pages within a single user’s virtual machine is permitted,but sharing of contents between different users is forbidden.In our proposed approach,virtual machines with similar operating systems of active domains in a node are recognised and organised into a homogenous batch,with memory deduplication performed inside that batch,to improve the memory pages sharing efficiency.When compared to memory deduplication applied to the entire host,implementation details demonstrate a significant increase in the number of pages shared when memory deduplication applied batch-wise and CPU(Central processing unit)consumption also increased.