摘要
Mashup是一种流行的web2.0应用,由开发者将互联网上多个web数据源的数据进行聚合构建而成.大多数mashup工具支持通过可视化的数据流程设计来开发mashup,但是缺少编程经验的终端用户设计的数据流程可能执行效率很低,当处理较大规模数据时mashup的响应时间会大幅增加.本文研究如何通过数据处理操作的合并拆分、次序交换、并行化等技术实现mashup的数据流程优化,提高mashup的性能及可扩展性.本文提出一种新的mashup性能优化方法,对多样化的mashup组件标注其操作语义特征属性及代价模型,定义适用于mashup的流程变换规则,针对用户设计的mashup数据流程生成所有与其语义等价的流程,并提出算法建立流程之间的代价偏序关系图从而快速选择执行代价最小的流程.文中实现了一个mashup工具,实验表明该方法可以有效提高终端用户设计的mashup的执行效率.
Mashup is a new kind of web2.0 applications createxi by aggregating and manipulating data from several web data sources. Mashup tools usually support visually designing data flows to create mashup. Because mashup developers are of varying degrees of technical expertise, the data flows may be of high cost because of inefficient design. This will definitely increase the response time and impair the QoS of mashup. In this paper, we target on enhancing the performance of mashup base on data flow transformation techniques such as operator merging, operator swapping, and operator parallelism. A new optimization method is presented for mash- up, which models a mashup as a data flow graph, annotates operation semantics features and cost model for mashup component, generates semantics equivalent data flows by transforming rules and construct a partially ordered diagram based on the cost of these data flows for quickly optimal selection. Key implementation techniques are provided and efficiency improvement of mashup is demonstrated by experiments.
出处
《小型微型计算机系统》
CSCD
北大核心
2011年第9期1716-1722,共7页
Journal of Chinese Computer Systems
基金
国家"九七三"重点基础研究发展计划项目(2009CB320704)资助
国家"八六三"高技术研究发展计划项目(2007AA010301)资助
国家"核高基"重大专项项目(2009ZX01043-003-002)资助