-
Notifications
You must be signed in to change notification settings - Fork 3
mxqsub fail / lousy performance finding jobs to activate #51
Comments
For the records: mxqprj@macheteinfach:~> mysql -e 'show processlist' +---------+--------+--------------------------------------------------------+------+---------+------+-------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+--------+--------------------------------------------------------+------+---------+------+-------------+------------------------------------------------------------------------------------------------------+ | 2799943 | mxqprj | gynaekophobie.molgen.mpg.de:60836 | mxq | Sleep | 5 | | NULL | | 3005008 | mxqprj | altmetall.molgen.mpg.de:48334 | mxq | Sleep | 5 | | NULL | | 3011795 | mxqprj | uselessbox.molgen.mpg.de:55338 | mxq | Sleep | 5 | | NULL | | 3036007 | mxqprj | simsalabimbambasaladusaladim.molgen.mpg.de:43074 | mxq | Sleep | 4 | | NULL | | 3037048 | mxqprj | internetguide.molgen.mpg.de:35414 | mxq | Execute | 331 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037150 | mxqprj | hal102.molgen.mpg.de:49024 | mxq | Execute | 188 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037196 | mxqprj | avaritia.molgen.mpg.de:58760 | mxq | Execute | 145 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037216 | mxqprj | shutupandtakemymoney.molgen.mpg.de:38190 | mxq | Execute | 112 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037231 | mxqprj | gula.molgen.mpg.de:48256 | mxq | Execute | 83 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037244 | mxqprj | ira.molgen.mpg.de:47218 | mxq | Execute | 77 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037262 | mxqprj | esodophobie.molgen.mpg.de:41534 | mxq | Execute | 69 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037266 | mxqprj | mylittlepony.molgen.mpg.de:58740 | mxq | Execute | 67 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037288 | mxqprj | disastergirl.molgen.mpg.de:53260 | mxq | Execute | 57 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037291 | mxqprj | luxuria.molgen.mpg.de:37948 | mxq | Execute | 55 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037301 | mxqprj | hal101.molgen.mpg.de:43668 | mxq | Execute | 47 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037308 | mxqprj | evilplottingraccoon.molgen.mpg.de:56266 | mxq | Execute | 42 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037310 | mxqprj | nomnomnom.molgen.mpg.de:59646 | mxq | Execute | 38 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037313 | mxqprj | confessionbear.molgen.mpg.de:47434 | mxq | Execute | 36 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037331 | mxqprj | invidia.molgen.mpg.de:36144 | mxq | Execute | 28 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037345 | mxqprj | rechnerzumrechnenvonrechenaufgaben.molgen.mpg.de:42706 | mxq | Execute | 20 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037348 | mxqprj | fordprefect.molgen.mpg.de:54652 | mxq | Execute | 18 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037351 | mxqprj | peanutbutterjellytime.molgen.mpg.de:51024 | mxq | Execute | 16 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037354 | mxqprj | superbia.molgen.mpg.de:59626 | mxq | Execute | 14 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037358 | mxqprj | foreveralone.molgen.mpg.de:46498 | mxq | Execute | 12 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037374 | mxqprj | grumpycat.molgen.mpg.de:50834 | mxq | Execute | 8 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037381 | mxqprj | acedia.molgen.mpg.de:50870 | mxq | Execute | 6 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037386 | mxqprj | dontpanic.molgen.mpg.de:39942 | mxq | Execute | 4 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037389 | mxqprj | philosoraptor.molgen.mpg.de:39564 | mxq | Execute | 1 | System lock | UPDATE mxq_job SET daemon_id = ?, host_hostname = ?, host_slots = ?, server_id = ?, job | | 3037392 | mxqprj | localhost | NULL | Query | 0 | init | show processlist | +---------+--------+--------------------------------------------------------+------+---------+------+-------------+------------------------------------------------------------------------------------------------------+ |
The UPDATE is from https://github.molgen.mpg.de/mariux64/mxq/blob/master/mxq_job.c#L294 From a test instance with a copy of the database: Ich glaube, das Problem ist die Kombination "WHERE job_status=0 [...]" mit "order by job_priority". Ob job_priority index ist oder nicht ist egal, da die Sortierung ja über die Ergebnismenge geht. mysql> select job_id from mxq_job where job_status=0 order by job_id limit 1; +---------+ | job_id | +---------+ | 8286229 | +---------+ 1 row in set (0.00 sec) mysql> select job_id from mxq_job order by job_priority,job_id limit 1; +--------+ | job_id | +--------+ | 54012 | +--------+ 1 row in set (0.00 sec) mysql> select job_id from mxq_job where job_status=0 order by job_priority,job_id limit 1; +---------+ | job_id | +---------+ | 8286229 | +---------+ 1 row in set (24.07 sec) mysql> select count(job_id) from mxq_job where job_status=0; +---------------+ | count(job_id) | +---------------+ | 120647 | +---------------+ 1 row in set (0.02 sec) Es müssten eben 120000 ergebnis-rows erzeugt, gespeichert und sortiert werden. Die Table verkleinern, indem obsolete jobs aufgeräumt werden, würde schon was bringen: mysql> create table test LIKE mxq_job; Query OK, 0 rows affected (0.13 sec) mysql> insert into test select * from mxq_job where job_id>8000000; Query OK, 408122 rows affected (11.74 sec) Records: 408122 Duplicates: 0 Warnings: 0 mysql> select job_id from test where job_status=0 order by job_priority,job_id limit 1; +---------+ | job_id | +---------+ | 8286229 | +---------+ 1 row in set (0.93 sec) |
The "SQL database as queue antipattern"-fraction might have a point, although some are arguing against. https://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you All problems predicted... :-) |
Average runtime of jobs in the queue is about 2 minutes, but about 200 jobs running, so about every half second a jobs finishes. The mxqds are fully serialized grabbing jobs in queue and need much longer. We have this sort problem, which is not be speed up by an index, and we have an update on several columns which are indexed, so lots of index operations. Total cluster performance is not limited by free capacity but by this job-grabbing operation. Pending locks are not served in order. Sometimes the lock wait time exceeds the timeout (50s). |
As a quick experimental work-around, I suggest to temporarily disable the sorting by priority, a feature which users don't use very often. mysql> select job_priority,count(*) from mxq_job group by job_priority; +--------------+----------+ | job_priority | count(*) | +--------------+----------+ | 127 | 8364087 | | 128 | 475 | | 130 | 18 | +--------------+----------+ 3 rows in set (8.84 sec) Update: Would not resolve the problem. See update on comment two above. |
On live database; mysql> select job_id from mxq_job WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id LIMIT 1; +---------+ | job_id | +---------+ | 8310461 | +---------+ 1 row in set (0.60 sec) mysql> update mxq_job set daemon_id=daemon_id,host_hostname=host_hostname,host_slots=host_slots,server_id=server_id,job_status=job_status WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_id LIMIT 1; Query OK, 0 rows affected (15.28 sec) Rows matched: 1 Changed: 0 Warnings: 0 So its (no longer?) a query problem, but just the queuing for the update. |
The buffer pool is big enough, and there is still room.
|
Update only to column not indexed: still bad performance: mysql> update mxq_job set stats_nvcsw=stats_nvcsw WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id LIMIT 1; Query OK, 0 rows affected (0.63 sec) Rows matched: 1 Changed: 0 Warnings: 0 select on same contraints: better performance: mysql> select 1 from mxq_job WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.18 sec) Select for update: bad performance again: mysql> select 1 from mxq_job WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id LIMIT 1 for update; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.63 sec) So the problem might be the locks. mysql> select 1 from mxq_job WHERE group_id = 9597 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id LIMIT 1 for update; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) So maybe all records are locked, even if the update only touches one. In fact it takes the same time to lock without the limit: select 1 from mxq_job WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' AND host_pid = 0 ORDER BY job_priority, job_id for update; +---+ 120644 rows in set (0.69 sec) Maybe first can select without lock first, then select the candidate row again by job_id plus original (unassigned) contraints mysql> select job_id from mxq_job WHERE group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' +---------+ | job_id | +---------+ | 8286229 | +---------+ 1 row in set (0.18 sec) mysql> select 1 from mxq_job WHERE job_id=8286229 AND group_id = 22930 AND job_status = 0 AND daemon_id = 0 AND host_hostname = '' AND server_id = '' for update; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql> update mxq_job set job_status=job_status where job_id=8286229; Query OK, 0 rows affected (0.00 sec) Rows matched: 1 Changed: 0 Warnings: 0 mysql> commit; Query OK, 0 rows affected (0.00 sec) |
This gives a little bit insight: mysql> select lock_mode as LM,lock_type,lock_table,lock_index,trx_state,trx_started,trx_wait_started,left(trx_query,40),trx_operation_state,trx_rows_locked from INNODB_LOCKS,INNODB_TRX where lock_trx_id=trx_id; +----+-----------+-----------------+------------+-----------+---------------------+---------------------+------------------------------------------+---------------------+-----------------+ | LM | lock_type | lock_table | lock_index | trx_state | trx_started | trx_wait_started | left(trx_query,40) | trx_operation_state | trx_rows_locked | +----+-----------+-----------------+------------+-----------+---------------------+---------------------+------------------------------------------+---------------------+-----------------+ | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:46 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 873, | starting index read | 1 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:44 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 867, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:42 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 882, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | RUNNING | 2017-04-28 19:16:41 | NULL | UPDATE mxq_job SET daemon_id = 868, | starting index read | 331719 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:37 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 865, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:34 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 859, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:33 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 872, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:29 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 881, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:28 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 860, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:24 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 861, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:23 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 877, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:18 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 869, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:16 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 864, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:11 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 875, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:16:06 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 874, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:55 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 871, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:52 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 878, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:50 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 876, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:45 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 862, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:44 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 858, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:27 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 880, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:25 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 879, | starting index read | 2 | | X | RECORD | `mxq`.`mxq_job` | job_status | LOCK WAIT | 2017-04-28 19:15:21 | 2017-04-28 19:16:46 | UPDATE mxq_job SET daemon_id = 870, | starting index read | 2 | +----+-----------+-----------------+------------+-----------+---------------------+---------------------+------------------------------------------+---------------------+-----------------+ 23 rows in set (0.00 sec) |
tested with mysql-5.7.18: same timing, not faster |
When the server chose a job to assign to itself by UPDATE mxq_job SET ... WHERE (unassigned) ORDER BY LIMIT 1 we had a performance problem when a group contained very many pending jobs, because the UPDATE temporarily locked all records considered (issue #51). Now we SELECT without any lock and then UPDATE the single job by its primary index. In case there is a collision with another server, the second UPDATE will have no effect, because it still contains the original contraints. The second server will returns to its main loop where it will sleep for some seconds.
When the server chose a job to assign to itself by UPDATE mxq_job SET ... WHERE (unassigned) ORDER BY LIMIT 1 we had a performance problem when a group contained very many pending jobs, because the UPDATE temporarily locked all records considered (issue #51). Now we SELECT without any lock and then UPDATE the single job by its primary index. In case there is a collision with another server, the second UPDATE will have no effect, because it still contains the original contraints. The second server will returns to its main loop where it will sleep for some seconds.
When the server chose a job to assign to itself by UPDATE mxq_job SET ... WHERE (unassigned) ORDER BY LIMIT 1 we had a performance problem when a group contained very many pending jobs, because the UPDATE temporarily locked all records considered (issue #51). Now we SELECT without any lock and then UPDATE the single job by its primary index. In case there is a collision with another server, the second UPDATE will have no effect, because it still contains the original contraints. The second server will returns to its main loop where it will sleep for some seconds.
Test-Cluster (16 nodes, 150 cores) with patch e6bfc5d and submit of 200000 times "sleep 20" everything looked good (150 running jobs) for hours until about 145759 jobs were submitted. From then on only 60-70 jobs were running ( 116155 pending, 69 running, 2953 finished).
timings: 0.31 sec 0.36 sec, 0.35 sec, 0.44 sec 0.37 sec , average 0.366 sec So we can start maximum 1/0.366s = 2.73 jobs per second just because of the select. The patch improved a lot, mysql is running on 300% (instead 100%) but still the cluster is limited by mysql. |
When |
Maybe this was thought/talked about before. What would happen if the jobs within a group will also get a sequence number? Say 'group_id,seq_nr', this pair would give an uniqe index. It's more a 'feeling', but I think such a thing might be faster. (or even weirder, one field with 'group_id*100_000+seq_nr') Just checked the idea on a larger range, forget it! --------------- TWIMC --------------- get the big ones ...
find out a 'range'.
1 row in set (0.18 sec) count via primary key
1 row in set (6.65 sec) the other way
1 row in set (0.16 sec) |
Why would |
We've already made a BIG progress. I compared the version from the master with the version from fix-issue-51 branch. They started with exactly the same database (preloaded 200000 pending jobs, "sleep 20"), the same mysql server (theinternet) and the same test-cluster. I let each one run for a little less then about one hour. master: 1500 jobs in 49 min , 1750 jobs in 57min fix-issue-51: 1500 jobs in 11 min , 8270 jobs in 59min So we are 4 to 5 times faster already. Plus we no longer have any any index or gap locks. So mxqsub should never need to wait when inserting new jobs and the problem reported by the user (Deadlock found when trying to get lock; try restarting transaction) should no longer be possible. So I think we can take the commits in any case. I just want to review the code to make sure, that it can't make a difference when the additional constraints are left out. |
the problem is that the algorithm was not designed for such short jobs... it just not scalable for very short jobs... job times should be at least 15min... but it won't scale with 15 minutes anyway if number of cores increases dramatically.... so instead of optimizing the queries (which will not fix the problem but the symptoms) you should optimize the selection algorithm... next iteration would be to bulk assign/start jobs... because the hosts knows how much jobs it can do in X minutes... so if it has 80 slots...and 2 minute jobs just plan for 15 minutes and assign Y jobs... so it would be 15minutes for 2minute jobs that would be 7 jobs per slot for 80 slots just assign 560 jobs for this server... so instead of doing transactional selects for update... just increase the LIMIT 1 to a higher value for fast running jobs... also since you know the total number of cores in the cluster, the runtimes and available slots just set it to a sane high number... start with number of jobs equal to number of jobs the server can start for this group... can you benchmark the LIMIT N approach? (where N is max number of jobs the server can run from this group) as far as i remember already assigned jobs are used first instead of assigning new jobs.. |
see #57 for sample implementation... for benchmarking purposes.. |
no logic was changed in this commit renamed mxq_assign_job_from_group_to_daemon() to mxq_assign_jobs_from_group_to_daemon() and added parameter limit relates to issue #51
User report
Several mxqd log similar errors (but continue). Frequency of jobs getting from INQ to RUNNING severely limited.
The text was updated successfully, but these errors were encountered: