In order to improve the real-time performance of the real-time HLA(high level architecture) in the application of massive data communication volume,multi-thread processing was adopted,thread pool structure was introdu...In order to improve the real-time performance of the real-time HLA(high level architecture) in the application of massive data communication volume,multi-thread processing was adopted,thread pool structure was introduced into the system,different threads to handle corresponding message queues was utilized to respond different message requests.Furthermore,an allocation strategy of semi-complete deprivation of priority was adopted,which reduces thread switching cost and processing burden in the system,provided that the message requests with high priority can be responded in time,thus improves the system's overall performance.The design and experiment results indicate that the method proposed in this paper can improve the real-time performance of HLA in distributed system applications greatly.展开更多
Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the C...Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.展开更多
mc211vm is a process-level ARM-to-x86 binary translator developed in our lab in the past several years. Currently, it is able to emulate singlethreaded programs. We extend mc211vm to emulate multi-threaded programs. O...mc211vm is a process-level ARM-to-x86 binary translator developed in our lab in the past several years. Currently, it is able to emulate singlethreaded programs. We extend mc211vm to emulate multi-threaded programs. Our main task is to reconstruct its architecture for multi-threaded programs. Register mapping, code cache management, and address mapping in mc2llvm have all been modified. In addition, to further speed up the emulation, we collect hot paths, aggressively optimize and generate code for them at run time. Additional threads are used to alleviate the overhead. Thus, when the same hot path is walked through again, the corresponding optimized native code will be executed instead. In our experiments, our system is 8.8X faster than QEMU (quick emulator) on average when emulating the specified benchmarks with 8 guest threads.展开更多
Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the w...Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the world,records the realtime news that occurs every day,and provides users with a good database of data,but because of the large amount of data,it puts a lot of pressure on users to search.At present,single-threaded crawling data can no longer meet the requirements of text crawling.In order to improve the performance and program versatility of single-threaded crawlers,a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database.Multi-threaded crawling uses multiple threads to process web pages in parallel,combining breadth-first and depth-first algorithms to control web crawling.The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method,the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.展开更多
In this paper, we conduct research on the Java multi-thread programming and its further development tendency. Multithreading mechanisms can run several programs at the same time, make the program run effi ciency becom...In this paper, we conduct research on the Java multi-thread programming and its further development tendency. Multithreading mechanisms can run several programs at the same time, make the program run effi ciency becomes higher that also can overcome the problem of basic traditional programming language design while its design is the key to the realization of the synchronous thread. Multithreading is a mechanism that allows concurrent execution of multiple instruction stream in the program, each instruction stream is called a thread, independent from each other between each other. Thread is also known as a lightweight process, it have independent execution and process control. Our research starts from the analysis of the corresponding mechanism to enhance the performance that is innovative and meaningful.展开更多
This article describes three algorithms for distance field generation on triangulated model: brute force algorithm, single-threaded algorithm based on spatial partition and multi-threaded algorithm based on spatial pa...This article describes three algorithms for distance field generation on triangulated model: brute force algorithm, single-threaded algorithm based on spatial partition and multi-threaded algorithm based on spatial partition. Spatial partition algorithm use equidistant network divide the bounding box into equal-sized cubes, calculates the maximum and minimum distances between the sample point and each of the small cubes,taking the minimum value from the maximum distance as the minimum distance from the sample point to the model named d1, comparing d1 with the distance from sample point to every little cube's minimum distance d2, if d1 <d2, the sample point's distance to all triangles inside this cube are greater than d1, skip this cube, otherwise, calculated the distance from the point to all the triangles intersect with the cube, then alternative d1 with the minimum value, circulate all small cubes intersect with the model. Comparing the calculation results, it can be seen that the algorithm about the multi-threaded distance field relative to the other two algorithms in computational speed is greatly improved especially for complex models.展开更多
针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio ...针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio 2019和QT15.5开发基于FFmpeg的音视频处理软件,对多路视频流进行封装、推流,并搭建Nginx流媒体服务器进行分发。通过实验表明,该系统整体的传输延迟低于1 s,且拥有良好的率失真特性,监控画面清晰、稳定性高,实现了对交通管理设备实时稳定的安全监控。展开更多
The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synch...The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.展开更多
基金Sponsored by the National Defence SciTech Key Lab Fundation(51457040204BQ0102)
文摘In order to improve the real-time performance of the real-time HLA(high level architecture) in the application of massive data communication volume,multi-thread processing was adopted,thread pool structure was introduced into the system,different threads to handle corresponding message queues was utilized to respond different message requests.Furthermore,an allocation strategy of semi-complete deprivation of priority was adopted,which reduces thread switching cost and processing burden in the system,provided that the message requests with high priority can be responded in time,thus improves the system's overall performance.The design and experiment results indicate that the method proposed in this paper can improve the real-time performance of HLA in distributed system applications greatly.
基金Project(IRT0725)supported by the Changjiang Innovative Group of Ministry of Education,China
文摘Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.
基金supported by NSC under Grant No.NSC 100-2218-E-009-009MY3 and NSC 100-2218-E-009-010-MY3
文摘mc211vm is a process-level ARM-to-x86 binary translator developed in our lab in the past several years. Currently, it is able to emulate singlethreaded programs. We extend mc211vm to emulate multi-threaded programs. Our main task is to reconstruct its architecture for multi-threaded programs. Register mapping, code cache management, and address mapping in mc2llvm have all been modified. In addition, to further speed up the emulation, we collect hot paths, aggressively optimize and generate code for them at run time. Additional threads are used to alleviate the overhead. Thus, when the same hot path is walked through again, the corresponding optimized native code will be executed instead. In our experiments, our system is 8.8X faster than QEMU (quick emulator) on average when emulating the specified benchmarks with 8 guest threads.
基金This research is funded by the Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ2016+2 种基金2016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269.Accurate crawler design and implementation with a data cleaning function,National Students innovation and entrepreneurship of training program,grant number 201811532010.This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ20162016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269.This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open project,grant number 20181901CRP03,20181901CRP04,20181901CRP05.
文摘Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the world,records the realtime news that occurs every day,and provides users with a good database of data,but because of the large amount of data,it puts a lot of pressure on users to search.At present,single-threaded crawling data can no longer meet the requirements of text crawling.In order to improve the performance and program versatility of single-threaded crawlers,a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database.Multi-threaded crawling uses multiple threads to process web pages in parallel,combining breadth-first and depth-first algorithms to control web crawling.The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method,the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.
文摘In this paper, we conduct research on the Java multi-thread programming and its further development tendency. Multithreading mechanisms can run several programs at the same time, make the program run effi ciency becomes higher that also can overcome the problem of basic traditional programming language design while its design is the key to the realization of the synchronous thread. Multithreading is a mechanism that allows concurrent execution of multiple instruction stream in the program, each instruction stream is called a thread, independent from each other between each other. Thread is also known as a lightweight process, it have independent execution and process control. Our research starts from the analysis of the corresponding mechanism to enhance the performance that is innovative and meaningful.
文摘This article describes three algorithms for distance field generation on triangulated model: brute force algorithm, single-threaded algorithm based on spatial partition and multi-threaded algorithm based on spatial partition. Spatial partition algorithm use equidistant network divide the bounding box into equal-sized cubes, calculates the maximum and minimum distances between the sample point and each of the small cubes,taking the minimum value from the maximum distance as the minimum distance from the sample point to the model named d1, comparing d1 with the distance from sample point to every little cube's minimum distance d2, if d1 <d2, the sample point's distance to all triangles inside this cube are greater than d1, skip this cube, otherwise, calculated the distance from the point to all the triangles intersect with the cube, then alternative d1 with the minimum value, circulate all small cubes intersect with the model. Comparing the calculation results, it can be seen that the algorithm about the multi-threaded distance field relative to the other two algorithms in computational speed is greatly improved especially for complex models.
文摘针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio 2019和QT15.5开发基于FFmpeg的音视频处理软件,对多路视频流进行封装、推流,并搭建Nginx流媒体服务器进行分发。通过实验表明,该系统整体的传输延迟低于1 s,且拥有良好的率失真特性,监控画面清晰、稳定性高,实现了对交通管理设备实时稳定的安全监控。
文摘The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.