内存申请是引发共享存储系统上MapReduce性能下降的主要瓶颈之一,特别是对于需要处理大量键值的应用尤为严重.为了解决此问题,提出了一种内存开销低、能高效处理大规模键值的MapReduce并行计算框架——MALK(high-efficient MapReduce fo...内存申请是引发共享存储系统上MapReduce性能下降的主要瓶颈之一,特别是对于需要处理大量键值的应用尤为严重.为了解决此问题,提出了一种内存开销低、能高效处理大规模键值的MapReduce并行计算框架——MALK(high-efficient MapReduce for applications having large amount of keys).MALK对于离散的大规模键值采用连续的存储管理方法,避免了大量小块内存的申请;通过更细粒度地处理Map阶段的任务和流水化Reduce阶段的任务,来减少系统运行过程中同时活跃的数据量,从而将应用程序对内存的需求控制在一个较小的范围内;并提出一种Hash表的复用机制,通过复用Hash表的存储空间来避免流水过程中Hash表内存的重复申请;MALK还综合考虑了任务的粒度和数量对任务管理开销和整体性能的影响,把Reduce阶段的任务数量设成对系统性能最优的值.实验结果表明:相对于Phoenix++,MALK的性能最高可提升3.8倍(平均2.8倍);在Map和Reduce阶段,MALK最多可节省95.2%和87.8%的存储空间;MALK在Reduce阶段还取得了更好的负载均衡,降低了L2和LLC Cache的缺失率.展开更多
Kernel canonical correlation analysis(CCA) is a nonlinear extension of CCA,which aims at extract-ing the information shared by two random variables. It has wide applications in many fields,such as information retrieva...Kernel canonical correlation analysis(CCA) is a nonlinear extension of CCA,which aims at extract-ing the information shared by two random variables. It has wide applications in many fields,such as information retrieval. This paper gives the convergence rate analysis of kernel CCA under some approximation conditions and some suggestions on how to choose the regularization parameter. The result shows that the convergence rate only depends on two parameters:the rate of regularization parameter and the decay rate of eigenvalues of compact operator VY X,and it gives better understanding of kernel CCA.展开更多
文摘内存申请是引发共享存储系统上MapReduce性能下降的主要瓶颈之一,特别是对于需要处理大量键值的应用尤为严重.为了解决此问题,提出了一种内存开销低、能高效处理大规模键值的MapReduce并行计算框架——MALK(high-efficient MapReduce for applications having large amount of keys).MALK对于离散的大规模键值采用连续的存储管理方法,避免了大量小块内存的申请;通过更细粒度地处理Map阶段的任务和流水化Reduce阶段的任务,来减少系统运行过程中同时活跃的数据量,从而将应用程序对内存的需求控制在一个较小的范围内;并提出一种Hash表的复用机制,通过复用Hash表的存储空间来避免流水过程中Hash表内存的重复申请;MALK还综合考虑了任务的粒度和数量对任务管理开销和整体性能的影响,把Reduce阶段的任务数量设成对系统性能最优的值.实验结果表明:相对于Phoenix++,MALK的性能最高可提升3.8倍(平均2.8倍);在Map和Reduce阶段,MALK最多可节省95.2%和87.8%的存储空间;MALK在Reduce阶段还取得了更好的负载均衡,降低了L2和LLC Cache的缺失率.
基金supported by National Natural Science Foundation of China (Grant Nos. 11001247, 11071276)
文摘Kernel canonical correlation analysis(CCA) is a nonlinear extension of CCA,which aims at extract-ing the information shared by two random variables. It has wide applications in many fields,such as information retrieval. This paper gives the convergence rate analysis of kernel CCA under some approximation conditions and some suggestions on how to choose the regularization parameter. The result shows that the convergence rate only depends on two parameters:the rate of regularization parameter and the decay rate of eigenvalues of compact operator VY X,and it gives better understanding of kernel CCA.