摘要
交叉证认是实现多波段数据融合的关键技术,目前还缺乏对其分布式算法的研究。快速增长的数据规模使该问题必须要依赖分布式并行计算技术解决。提出了一种基于MapReduce分布式模型的新方法,根据Map-Reduce的要点,尽量减少了任务间的通信量,并通过合理设置划分粒度保证了效率与存储间的平衡。实验结果表明,该方法对海量数据交叉证认的效率提升明显,在大规模集群上达到了接近线性的加速比。该方法为交叉证认提供了一种快速有效的解决途径。
Cross-match is the kernel technology to realize multi-band data aggregation. It still remains blank in the research of its distributed processing functions. As the astronomical data is growing geometrically,it is inevitable to use distributed computing technologies to resolve it. This paper issued a new function based on MapReduce distributed computing model. According to MapReduce’s design essentials,reduced the intra-node communication as far as possible,and insured a balance between efficiency and storage through choosing right partition granularity. The experimental results show that this function has a marked performance superiority comparing with previous functions,and achieves near-linear speedup in large-scale clusters. This new function is a quick and effective solution to astronomical cross-match problem.
出处
《计算机应用研究》
CSCD
北大核心
2010年第9期3322-3325,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(10978016)
天津自然科学基金资助项目(08JCZDJC19700)
天津市科技支撑重点项目(09ZCKFGX00400)