-
Notifications
You must be signed in to change notification settings - Fork 452
SchedMatch
Currently the scheduler's work-sending algorithm is:
- if host is reliable, scan entire job array, looking for retries
- if using HR, scan entire job array, looking for jobs committed to this HR class
- if still need work, scan entire job array with no restrictions
This is bad for several reasons:
- inefficient: repeated array scans
- inflexible policy: why should reliable be more important than HR?
- hard to add new policies, like sending "hard" jobs to fasts hosts
To solve these problems we're going to change this part of the scheduler. Basic idea: given a job J and a host H, there is a function V(J, H) that represents the value of sending J to H. V(J, H) might reflect various factors:
- the computational "hardness" of J
- CPU/deadline ratio
- RAM or disk requirements
- H already has files required by J(or jobs already planned for sending have such files)
- J is a retry and H is a fast/reliable host
- J has already been committed to H's HR class
BOINC includes a default value function. Projects can tweak its weights, or define their own value function.
functions:
checks if a set of jobs is feasible (no DB access)
- disk usage
- deadline check (EDF sim or crude)
- one result per user/host (no DB)
feasibility checks that can be done with no DB access
- WU committed to different platform (no DB check)
- app filtering
- memory usage
feasibility checks that need DB access
- one result per user or host per WU (look in DB)
- WU committed to different platform (look in DB)
Parameters:
scan at least this many slots (if scan N slots and have enough jobs to send, stop)
scan at most this many slots (even if don't have enough jobs to send yet)
if scan this many locked slots, print warning msg (should increase shmem size)
logic:
acquire semaphore
i = random index in shmem
x = ordered list of jobs to send (empty)
slots_scanned = 0
slots_locked = 0
loop
i = i+1 % array_size
slots_scanned++
if slots_scanned > M
break
if shmem[i] is empty
continue
if shmem[i] is locked
slots_locked++
continue
j = shmem[i]
if !job_feasible_fast(j) continue;
v = v(h, j)
if v <= lowest score in x
continue
S = jobs in x with value >= v
if !job_set_feasible(S+j)
continue
lock j
release semaphore
if !job_feasible_slow(j)
acquire semaphore
unlock j
continue
acquire semaphore
add j to x
while (x minus lowest-val element) satisfies work request
remove lowest-val element of x
while !job_set_feasible(x)
remove lowest-value element of x
if x satisfies work request and slots_scanned >= N
break;
for each job j in x
mark slot j as empty
release semaphore
if slots_locked > L
print "need bigger array" message