离群点检测是数据管理领域中的一个重要问题,在信用卡欺诈检测、工业工程过程管理、银行数据处理等方面等均有广泛应用.大数据时代的到来加剧了对大规模流媒体数据进行离群点检测多样化的需求,不同用户可根据自身偏好选择不同类型的数...离群点检测是数据管理领域中的一个重要问题,在信用卡欺诈检测、工业工程过程管理、银行数据处理等方面等均有广泛应用.大数据时代的到来加剧了对大规模流媒体数据进行离群点检测多样化的需求,不同用户可根据自身偏好选择不同类型的数据作为离群点.针对流数据环境下多离群点检测问题,提出了一种查询处理框架MQOD(Multiple Query of Outlier Detection),利用多查询任务之间的包含关系来支持多离群点检测任务,从而提高查询效率.在MQOD框架下,构建了HT-Grid索引以支持流数据的管理,利用滑动窗口的时间特性对窗口进行划分,并根据划分结果确定执行查询的范围,减少不必要的对象访问.通过真实数据集和合成数据集对MQOD算法进行了验证,验证结果表征了算法的高效性.展开更多
针对数据网格环境下的多QoS约束任务调度问题,提出了一种基于最早完成时间与QoS相识度的数据网格任务调度算法(data grid task scheduling algorithm based on Min-min and QoS similarity,MS-GTSA)。该算法将最早完成时间与S-GTSA算法...针对数据网格环境下的多QoS约束任务调度问题,提出了一种基于最早完成时间与QoS相识度的数据网格任务调度算法(data grid task scheduling algorithm based on Min-min and QoS similarity,MS-GTSA)。该算法将最早完成时间与S-GTSA算法相结合,在任务调度过程中,选取任务QoS约束与资源QoS匹配最佳,且完成时间最早的一项优先进行调度。在满足任务最佳QoS匹配的同时,时间跨度得到了较大的改善。仿真结果表明,该算法有效降低了任务调度的时间跨度,在综合性能上较S-GTSA算法有所提高。展开更多
Grid computing is the combination of com- puter resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale dat...Grid computing is the combination of com- puter resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.展开更多
网格计算作为分布式计算在科学计算领域的发展方向,可以为对地观测数据的处理提供强大的计算力。在分析遥感信息服务网格节点(Remote Sensing Information Service Grid Nodes,RSSN)中网络数据传输和负载均衡两个关键问题的基础上,提出...网格计算作为分布式计算在科学计算领域的发展方向,可以为对地观测数据的处理提供强大的计算力。在分析遥感信息服务网格节点(Remote Sensing Information Service Grid Nodes,RSSN)中网络数据传输和负载均衡两个关键问题的基础上,提出了一种有效的基于游程编码和Huffman编码的数据压缩方法和基于"计算端元"的任务分配策略,该方法针对遥感影像特点进行有效数据压缩,具有较好的压缩比,达到了17%,且能实现任务负载均衡。并在遥感信息服务网格节点计算平台上,以中国范围内1km分辨率气溶胶光学厚度(Aerosol Optical Depth,AOD)反演计算为例,从压缩率和计算时间效率方面验证和分析了上述方法的有效性。展开更多
文摘离群点检测是数据管理领域中的一个重要问题,在信用卡欺诈检测、工业工程过程管理、银行数据处理等方面等均有广泛应用.大数据时代的到来加剧了对大规模流媒体数据进行离群点检测多样化的需求,不同用户可根据自身偏好选择不同类型的数据作为离群点.针对流数据环境下多离群点检测问题,提出了一种查询处理框架MQOD(Multiple Query of Outlier Detection),利用多查询任务之间的包含关系来支持多离群点检测任务,从而提高查询效率.在MQOD框架下,构建了HT-Grid索引以支持流数据的管理,利用滑动窗口的时间特性对窗口进行划分,并根据划分结果确定执行查询的范围,减少不必要的对象访问.通过真实数据集和合成数据集对MQOD算法进行了验证,验证结果表征了算法的高效性.
文摘针对数据网格环境下的多QoS约束任务调度问题,提出了一种基于最早完成时间与QoS相识度的数据网格任务调度算法(data grid task scheduling algorithm based on Min-min and QoS similarity,MS-GTSA)。该算法将最早完成时间与S-GTSA算法相结合,在任务调度过程中,选取任务QoS约束与资源QoS匹配最佳,且完成时间最早的一项优先进行调度。在满足任务最佳QoS匹配的同时,时间跨度得到了较大的改善。仿真结果表明,该算法有效降低了任务调度的时间跨度,在综合性能上较S-GTSA算法有所提高。
文摘Grid computing is the combination of com- puter resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.
文摘网格计算作为分布式计算在科学计算领域的发展方向,可以为对地观测数据的处理提供强大的计算力。在分析遥感信息服务网格节点(Remote Sensing Information Service Grid Nodes,RSSN)中网络数据传输和负载均衡两个关键问题的基础上,提出了一种有效的基于游程编码和Huffman编码的数据压缩方法和基于"计算端元"的任务分配策略,该方法针对遥感影像特点进行有效数据压缩,具有较好的压缩比,达到了17%,且能实现任务负载均衡。并在遥感信息服务网格节点计算平台上,以中国范围内1km分辨率气溶胶光学厚度(Aerosol Optical Depth,AOD)反演计算为例,从压缩率和计算时间效率方面验证和分析了上述方法的有效性。