在云环境中,传统意义上的物理服务器正在逐渐被各式虚拟机所取代,云数据中心中托管的虚拟机镜像所占用的存储空间急剧增长,如何高效地管理这些镜像文件已成为云计算研究热点之一.由于虚拟机镜像内部存在大量空白重复数据块,这在一定程...在云环境中,传统意义上的物理服务器正在逐渐被各式虚拟机所取代,云数据中心中托管的虚拟机镜像所占用的存储空间急剧增长,如何高效地管理这些镜像文件已成为云计算研究热点之一.由于虚拟机镜像内部存在大量空白重复数据块,这在一定程度上导致了镜像内部冗余率较高.其次,不同的虚拟机镜像可能运行了相同的操作系统和应用程序,使得镜像之间同样存在较多的重复数据.针对海量虚拟机镜像,传统的去重策略将产生巨大的时间开销,同时会消耗巨大的内存空间和CPU资源,影响数据中心的性能.提出一种基于改进Simhash算法的海量虚拟机镜像多级去重方法,将一个完整的镜像文件分割为操作系统镜像段和应用数据镜像段,同时提取各部分的特征值,利用DBSCAN(density-based spatial clustering of applications with noise)聚类算法完成对镜像段的分组,将相似度较高的镜像段聚为一类,从而将全局去重分解为规模较小且重复率较高的分组内部去重,实现了指纹索引数据完全存放于内存中的重复数据删除,大幅减少了磁盘I/O次数,达到缩短去重时间的目的.展开更多
This paper introduces a novel transform method to produce the newly generated programs through code transform model called the second generation of Generative Pre-trained Transformer(GPT-2)reasonably,improving the pro...This paper introduces a novel transform method to produce the newly generated programs through code transform model called the second generation of Generative Pre-trained Transformer(GPT-2)reasonably,improving the program execution performance significantly.Besides,a theoretical estimation in statistics has given the minimum number of generated programs as required,which guarantees to find the best one within them.The proposed approach can help the voice assistant machine resolve the problem of inefficient execution of application code.In addition to GPT-2,this study develops the variational Simhash algorithm to check the code similarity between sample program and newly generated program,and conceives the piecewise longest common subsequence algorithm to examine the execution’s conformity from the two programs mentioned above.The code similarity check deducts the redundant generated programs,and the output conformity check finds the best-performing generative program.In addition to texts,the proposed approach can also prove the other media,including images,sounds,and movies.As a result,the newly generated program outperforms the sample program significantly because the number of code lines reduces 27.21%,and the program execution time shortens 24.62%.展开更多
文摘在云环境中,传统意义上的物理服务器正在逐渐被各式虚拟机所取代,云数据中心中托管的虚拟机镜像所占用的存储空间急剧增长,如何高效地管理这些镜像文件已成为云计算研究热点之一.由于虚拟机镜像内部存在大量空白重复数据块,这在一定程度上导致了镜像内部冗余率较高.其次,不同的虚拟机镜像可能运行了相同的操作系统和应用程序,使得镜像之间同样存在较多的重复数据.针对海量虚拟机镜像,传统的去重策略将产生巨大的时间开销,同时会消耗巨大的内存空间和CPU资源,影响数据中心的性能.提出一种基于改进Simhash算法的海量虚拟机镜像多级去重方法,将一个完整的镜像文件分割为操作系统镜像段和应用数据镜像段,同时提取各部分的特征值,利用DBSCAN(density-based spatial clustering of applications with noise)聚类算法完成对镜像段的分组,将相似度较高的镜像段聚为一类,从而将全局去重分解为规模较小且重复率较高的分组内部去重,实现了指纹索引数据完全存放于内存中的重复数据删除,大幅减少了磁盘I/O次数,达到缩短去重时间的目的.
基金This work is fully supported by the Ministry of Science and Technology,Taiwan,Republic of China,under Grant Nos.MOST 110-2622-E-390-001 and MOST 109-2622-E-390-002-CC3.
文摘This paper introduces a novel transform method to produce the newly generated programs through code transform model called the second generation of Generative Pre-trained Transformer(GPT-2)reasonably,improving the program execution performance significantly.Besides,a theoretical estimation in statistics has given the minimum number of generated programs as required,which guarantees to find the best one within them.The proposed approach can help the voice assistant machine resolve the problem of inefficient execution of application code.In addition to GPT-2,this study develops the variational Simhash algorithm to check the code similarity between sample program and newly generated program,and conceives the piecewise longest common subsequence algorithm to examine the execution’s conformity from the two programs mentioned above.The code similarity check deducts the redundant generated programs,and the output conformity check finds the best-performing generative program.In addition to texts,the proposed approach can also prove the other media,including images,sounds,and movies.As a result,the newly generated program outperforms the sample program significantly because the number of code lines reduces 27.21%,and the program execution time shortens 24.62%.