Skip to content

Issue 51 limit increase #57

Merged
merged 4 commits into from May 10, 2017
Merged

Issue 51 limit increase #57

merged 4 commits into from May 10, 2017

Conversation

mariux
Copy link
Contributor

@mariux mariux commented May 7, 2017

first bulk assign implementation for testing locking issue when assigning single jobs.. relates to #51

@mariux
Copy link
Contributor Author

mariux commented May 7, 2017

this needs further optimization for fast running jobs...

mariux added 2 commits May 7, 2017 19:03
no logic was changed in this commit

renamed mxq_assign_job_from_group_to_daemon() to mxq_assign_jobs_from_group_to_daemon()
and added parameter limit

relates to issue #51
@donald
Copy link
Contributor

donald commented May 8, 2017

Thanks. I gave it a ride on the test cluster. It performed a little bit better then fix-issue-51:

master:                    1500 jobs in 49 min ,  1750 jobs in 57 min
fix-issue-51:              1500 jobs in 11 min ,  8270 jobs in 59 min
issue-51-limit-increase:   1500 jobs in  9 min , 11166 jobs in 60 min

Although the initial loading was not as fast as I expected. This is because mxq_load_job_from_group_for_daemon still loads maximum one job before exiting to the server main loop with the sleep. Was this intended?

I'm currently running a second test with some jitter in the runtime of the jobs (perl -MTime::HiRes -e 'Time::HiRes::usleep((20+rand(10)-5)*1000000)' instead of sleep 20) which might be more realistic.

@donald
Copy link
Contributor

donald commented May 8, 2017

No, the main loop doesn't sleep... There's another problem:

2017-05-08 12:52:43 +0200 mxqd[2509]: hostname=sigill.molgen.mpg.de daemon_name=main daemon_id=9 :: MXQ server started.
2017-05-08 12:52:43 +0200 mxqd[2509]:   host_id=57b28696-f203-41c8-9ed7-b0387533efe8-2882b65-9cd
2017-05-08 12:52:43 +0200 mxqd[2509]: slots=7 memory_total=4096 memory_avg_per_slot=585 memory_limit_slot_soft=585 memory_limit_slot_hard=4096 :: server initialized.
2017-05-08 12:52:43 +0200 mxqd[2509]: cpu set available: [1-7]
2017-05-08 12:52:43 +0200 mxqd[2509]:   group=buczek(125):1 jobs_max=7 slots_max=7 memory_max=1400 slots_per_job=1 memory_per_job_thread=200.000000 :: group initialized.
2017-05-08 12:52:43 +0200 mxqd[2509]: recover: 1 running groups loaded.
2017-05-08 12:52:43 +0200 mxqd[2509]: ====================== SERVER DUMP START ======================
2017-05-08 12:52:43 +0200 mxqd[2509]:     user=buczek(125) slots_running=0 global_slots_running=0 global_threads_running=0
2017-05-08 12:52:43 +0200 mxqd[2509]:         group=buczek(125):1 test02 jobs_max=7 slots_per_job=1 jobs_in_q=200000
2017-05-08 12:52:43 +0200 mxqd[2509]: memory_used=0 memory_total=4096
2017-05-08 12:52:43 +0200 mxqd[2509]: slots_running=0 slots=7 threads_running=0 jobs_running=0
2017-05-08 12:52:43 +0200 mxqd[2509]: global_slots_running=0 global_threads_running=0
2017-05-08 12:52:43 +0200 mxqd[2509]: cpu set running: []
2017-05-08 12:52:43 +0200 mxqd[2509]: ====================== SERVER DUMP END ======================
2017-05-08 12:52:43 +0200 mxqd[2509]:   group=buczek(125):1 slots_to_start=7 slots_per_job=1 :: trying to start job for group.
2017-05-08 12:53:00 +0200 mxqd[2509]: WARNING: MySQL mysql_stmt_execute(): ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(OLD.stats_run_sec + (unix_timestamp(NEW.group_mtime) - unix_timestamp(OLD.group_mtime)))'
2017-05-08 12:53:00 +0200 mxqd[2509]: EMERGENCY: MySQL mysql_stmt_execute(): ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(OLD.stats_run_sec + (unix_timestamp(NEW.group_mtime) - unix_timestamp(OLD.group_mtime)))'
2017-05-08 12:53:00 +0200 mxqd[2509]: EMERGENCY: ERROR: mysql_stmt_execute() returned undefined error number: 1690
2017-05-08 12:53:00 +0200 mxqd[2509]: ERROR: mx_mysql_statement_execute(): Invalid exchange
2017-05-08 12:53:00 +0200 mxqd[2509]: ERROR: mx_mysql_do_statement(): Invalid exchange
2017-05-08 12:53:00 +0200 mxqd[2509]: ERROR:   group_id=1 :: mxq_assign_jobs_from_group_to_daemon(): Invalid exchange
2017-05-08 12:53:00 +0200 mxqd[2509]: Tried Hard and nobody is doing anything. Sleeping for a long while (15 seconds).
2017-05-08 12:53:15 +0200 mxqd[2509]:   group=buczek(125):1 slots_to_start=7 slots_per_job=1 :: trying to start job for group.
2017-05-08 12:53:28 +0200 mxqd[2509]:    job=buczek(125):1:99 :: new job loaded.
2017-05-08 12:53:28 +0200 mxqd[2509]: job assigned cpus: [7]
2017-05-08 12:53:28 +0200 mxqd[2525]:    job=buczek(125):1:99 host_pid=2525 pgrp=2509 :: new child process forked.
2017-05-08 12:53:28 +0200 mxqd[2525]: starting reaper process.
2017-05-08 12:53:28 +0200 mxqd[2526]: starting user process.
2017-05-08 12:53:30 +0200 mxqd[2509]:    job=buczek(125):1:99 :: added running job to watch queue.
2017-05-08 12:53:30 +0200 mxqd[2509]: slots_started=1 :: Main Loop started 1 slots.

@donald
Copy link
Contributor

donald commented May 8, 2017

BIGINT UNSIGNED value is out of range in '(OLD.stats_run_sec + (unix_timestamp(NEW.group_mtime) - unix_timestamp(OLD.group_mtime))) seen with master, too. Not a problem of these patches.

@donald
Copy link
Contributor

donald commented May 8, 2017

Same results with jitter:

master:                    1500 jobs in 51 min ,  1762 jobs in 60 min
fix-issue-51:              1500 jobs in 11 min ,  8100 jobs in 60 min
issue-51-limit-increase:   1500 jobs in  8 min . 12000 jobs in 60 min

@donald
Copy link
Contributor

donald commented May 9, 2017

Can you remind me, why we needed the status LOADED between ASSIGNED and RUNNING?

@donald
Copy link
Contributor

donald commented May 10, 2017

To answer my own question: LOADED means the server decided to start the job but has not yet recorded a pid in the database. So a job recorded as 'LOADED' in the database might or might not been started.

@donald donald merged commit b2298db into master May 10, 2017
@donald donald deleted the issue-51-limit-increase branch May 10, 2017 08:39
@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

btw.. the commits are not complete... ;) this was manly for benchmarking reasons..

this needs further patches to handle new issues that come with mass assign:
e.g. assigning more jobs than can be executed in multi user mode and blocking those for other servers.

assigning jobs_max is also just a quick fix and can be optimized even more like already stated in my other comments.. ;)

for the questions:

  • yes, loaded is state between assigned and started. if server dies in execution phase the state of job is unknown until marked running.
  • starting only one was intended because this meant the least change in code and starting one should not be blocking as it is only reading..

side hint - as i know you thought about doing centralized scheduling:
you can also add an external master to help scheduling jobs with more global knowledge e.g. from the outside by additionally assigning jobs for the cluster... mxqd will only self-assign new jobs it it does not find any assigned jobs in the database and has free ressources to start some.

@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

in addition LOADED should only be released by the daemon itself... ASSIGNED can be reset from the outside... (need to be verified again but that was the idea: it should be possible to "steal" a already assigned job and reassign it to another host...)

@donald
Copy link
Contributor

donald commented May 10, 2017

assigning more jobs than can be executed in multi user mod

yea, I guess I've merged to early.

@donald
Copy link
Contributor

donald commented May 10, 2017

Undoing the excessive assigns when we later decide, that we don't want to actually start all job (eg. over "fair share" and other users waiting or --max-jobs-per-node used) would probably waste more performance that was won.

So we need to calculate the exact number of jobs to start in advance.

@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

as said in earlier comments... this strategy should be used for fast running jobs... for long running jobs it's not needed because the pressure on the database is minimal.. so jobs with run-time > 15min (current default) should not be preassigned (limit 1 as usaual)... also jobs with runtime< 15min can be even preassigned with higher limits e.g. jobs_max * 15min/(actual runtime in minutes) ... or instead of job_max use current number of jobs a user can execute in this daemon... (see later comment)

@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

as far as i remember assigned jobs account for jobs_inq counter... and there is a place in code when jobs_inq > 0 and assigning new jobs fails... here we could try to steal an assigned job... (or a greater number of jobs)

@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

here we can actually calculate number of jobs in this group that might get started (correct only if no other group of this user with equal priority has jobs_inq > 0)... but this number will be <= jobs_max and honors at least multi user environments a lot better then using jobs_max..

@mariux
Copy link
Contributor Author

mariux commented May 10, 2017

So we need to calculate the exact number of jobs to start in advance.

edge case might be easily detected: would be something like jobs_inq <= jobs_max for this group..
if detected switch back to LIMIT 1.. and free assignments in main-loop (if any - track in glist(?)).. this would render stealing (which can also end in endless stealing) as not being needed anymore...

@donald
Copy link
Contributor

donald commented May 19, 2017

LOADED would perhaps be more obvious if it was called STARTING. I think this was more to the standard idioms for state machines.

@donald
Copy link
Contributor

donald commented Jun 2, 2017

Master has been reset to a state before this merge. Needs to be redone.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants