What is Speculative Execution and how it enhance job execution
1. Speculative execution is an job optimization. The goal of speculative
execution is to reduce job execution time
2. In a hadoop environment, the scheduler tracks the progress of all tasks of the same
type (map and reduce) in a job, and only launches speculative duplicates for the small
proportion that are running significantly slower than the average.
3. When a task completes successfully, any duplicate tasks that are running are killed since they are no longer needed. So, if the original task completes before the speculative task, the speculative task
is killed; on the other hand, if the speculative task finishes first, the original is killed.
4. Speculative execution is turned on by default. Its applicable for both Map and Reduce phase.
5. You can se the speculative execution on a per-job basis or set it true globally which will be applicable for all jobs running in the hadoop cluster. Below are the properties to enable/disable speculative execution :-
4. Speculative execution is turned on by default. Its applicable for both Map and Reduce phase.
5. You can se the speculative execution on a per-job basis or set it true globally which will be applicable for all jobs running in the hadoop cluster. Below are the properties to enable/disable speculative execution :-
set mapreduce.map.speculative true . Enable Speculative execution For Map tasks
set mapreduce.reduce.speculative true Enable Speculative execution For Reduce tasks
set mapreduce.map.speculative false . Disable Speculative execution For Map tasksset mapreduce.reduce.speculative false Disable Speculative execution For Reduce tasksDisadvantages of Speculative Execution:-
1. On a busy cluster, speculative execution can reduce overall throughput, since redundant
tasks are being executed in an attempt to bring down the execution time for a single job.
2. There is a good case for turning off speculative execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task, and this can significantly increase network traffic on the cluster.
2. There is a good case for turning off speculative execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task, and this can significantly increase network traffic on the cluster.
0 comments:
Post a Comment