Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor...Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor network (APN), communication contention has a strong influence on the execution time of a parallel application. This paper investigates the incorporation of contention awareness into task scheduling. The innovation is the idea of dynamically scheduling edges to links, for which we use the earliest finish communication time search algorithm based on shortest-path search method. The other novel idea proposed in this paper is scheduling priority based on recursive rank computation on heterogeneous arbitrary processor network. In the end, to reduce time complexity of algorithm, a parallel algorithm is proposed and speedup O(PPE) is achieved. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm significantly surpasses classic and static communication contention awareness algorithm, especially for high data transmission rate parallel application.展开更多
Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of...Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of task scheduling algorithms for homogeneous environments have been proposed, whereas, a few for heterogeneous environments can be found in the literature. A novel task scheduling algorithm for heterogeneous environments, called the heterogeneous critical task (HCT) scheduling algorithm is presented. By means of the directed acyclic graph and the gantt graph, the HCT algorithm defines the critical task and the idle time slot. After determining the critical tasks of a given task, the HCT algorithm tentatively duplicates the critical tasks onto the processor that has the given task in the idle time slot, to reduce the start time of the given task. To compare the performance of the HCT algorithm with several recently proposed algorithms, a large set of randomly generated applications and the Gaussian elimination application are randomly generated. The experimental result has shown that the HCT algorithm outperforms the other algorithm.展开更多
Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are bas...Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are based on the so-called list scheduling technique. The basic idea behind list scheduling is to prepare a sequence of nodes in the form of a list for scheduling by assigning them some priority measurements, and then repeatedly removing the node with the highest priority from the list and allocating it to the processor providing the earliest start time (EST). Therefore, it can be inferred that the makespans obtained are dominated by two major factors: (1) which order of tasks should be selected (sequence subproblem); (2) how the selected order should be assigned to the processors (assignment subproblem). A number of good approaches for overcoming the task sequence dilemma have been proposed in the literature, while the task assignment problem has not been studied much. The results of this study prove that assigning tasks to the processors using the traditional EST method is not optimum; in addition, a novel approach based on the ant colony optimization algorithm is introduced, which can find far better solutions.展开更多
Row Parallel Coarse-Grained Reconfigurable Architecture(RPCGRA)has the advantages of maximum parallelism and programmable flexibility.Designing an efficient algorithm to map the diverse applications onto RPCGRA is dif...Row Parallel Coarse-Grained Reconfigurable Architecture(RPCGRA)has the advantages of maximum parallelism and programmable flexibility.Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints.To solve this problem,the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA.In this paper,we present a Depth-First Greedy Mapping(DFGM)algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array(RCA).Compared with level breadth mapping,the performance of DFGM is better.The percentage of maximum improvement in the use times of RCA is 33%and the percentage of maximum improvement in non-original input and output times is 64.4%(Given Discrete Cosine Transfor 8(DCT8),and the area of reconfigurable processing unit is 56).Compared with level-based depth mapping,DFGM also obtains the lowest averages of use times of RCA,non-original input and output times,and the reconfigurable time.展开更多
基金Supported by the National Natural Science Foundation of China (Grant Nos. 90715029 and 60603053)the Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Edacation of Chinathe Key Project of Science & Technology of Hunan Province(Grant No. 2006GK2006)
文摘Task scheduling is an essential aspect of parallel process system. This NP-hard problem assumes fully connected homogeneous processors and ignores contention on the communication links. However, as arbitrary processor network (APN), communication contention has a strong influence on the execution time of a parallel application. This paper investigates the incorporation of contention awareness into task scheduling. The innovation is the idea of dynamically scheduling edges to links, for which we use the earliest finish communication time search algorithm based on shortest-path search method. The other novel idea proposed in this paper is scheduling priority based on recursive rank computation on heterogeneous arbitrary processor network. In the end, to reduce time complexity of algorithm, a parallel algorithm is proposed and speedup O(PPE) is achieved. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm significantly surpasses classic and static communication contention awareness algorithm, especially for high data transmission rate parallel application.
文摘Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of task scheduling algorithms for homogeneous environments have been proposed, whereas, a few for heterogeneous environments can be found in the literature. A novel task scheduling algorithm for heterogeneous environments, called the heterogeneous critical task (HCT) scheduling algorithm is presented. By means of the directed acyclic graph and the gantt graph, the HCT algorithm defines the critical task and the idle time slot. After determining the critical tasks of a given task, the HCT algorithm tentatively duplicates the critical tasks onto the processor that has the given task in the idle time slot, to reduce the start time of the given task. To compare the performance of the HCT algorithm with several recently proposed algorithms, a large set of randomly generated applications and the Gaussian elimination application are randomly generated. The experimental result has shown that the HCT algorithm outperforms the other algorithm.
基金Project supported by Sama Technical and Vocational Training College,Islamic Azad University,Shoushtar Branch,Shoushtar,Iran
文摘Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are based on the so-called list scheduling technique. The basic idea behind list scheduling is to prepare a sequence of nodes in the form of a list for scheduling by assigning them some priority measurements, and then repeatedly removing the node with the highest priority from the list and allocating it to the processor providing the earliest start time (EST). Therefore, it can be inferred that the makespans obtained are dominated by two major factors: (1) which order of tasks should be selected (sequence subproblem); (2) how the selected order should be assigned to the processors (assignment subproblem). A number of good approaches for overcoming the task sequence dilemma have been proposed in the literature, while the task assignment problem has not been studied much. The results of this study prove that assigning tasks to the processors using the traditional EST method is not optimum; in addition, a novel approach based on the ant colony optimization algorithm is introduced, which can find far better solutions.
基金supported by the Natural Science Foundation of Anhui Province(No.1808085MF203)the National Natural Science Foundation of China(No.61432017)。
文摘Row Parallel Coarse-Grained Reconfigurable Architecture(RPCGRA)has the advantages of maximum parallelism and programmable flexibility.Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints.To solve this problem,the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA.In this paper,we present a Depth-First Greedy Mapping(DFGM)algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array(RCA).Compared with level breadth mapping,the performance of DFGM is better.The percentage of maximum improvement in the use times of RCA is 33%and the percentage of maximum improvement in non-original input and output times is 64.4%(Given Discrete Cosine Transfor 8(DCT8),and the area of reconfigurable processing unit is 56).Compared with level-based depth mapping,DFGM also obtains the lowest averages of use times of RCA,non-original input and output times,and the reconfigurable time.