Skip to content

Lock wait timeout exceeded; try restarting transaction #157

Open
donald opened this issue Apr 26, 2024 · 2 comments
Open

Lock wait timeout exceeded; try restarting transaction #157

donald opened this issue Apr 26, 2024 · 2 comments

Comments

@donald
Copy link
Contributor

donald commented Apr 26, 2024

Several dead jobs were left in other states in the database (now fixed to UNKNOWN by daemon restart).

http://afk.molgen.mpg.de/mxq/mxq/group/553629
http://afk.molgen.mpg.de/mxq/mxq/group/552770

example:

2024-03-15 16:31:27 +0100 mxqd[21270]:   group=zidane_k(6603):552770 slots_to_start=32 slots_per_job=32 :: trying to start job for group.
2024-03-15 16:32:18 +0100 mxqd[21270]: WARNING: mx_mysql_do_statement() failed: Invalid exchange
2024-03-15 16:32:18 +0100 mxqd[21270]: ERROR: mx_mysql_do_statement(): Lock wait timeout exceeded; try restarting transaction
2024-03-15 16:32:18 +0100 mxqd[21270]: ERROR: job=zidane_k(6603):552770:50211444 mxq_job_update_status_running(): Connection timed out
2024-03-15 16:32:18 +0100 mxqd[21270]:    job=zidane_k(6603):552770:50211444 :: started. pid=123916
2024-03-15 16:32:18 +0100 mxqd[21270]: Main loop started 32 slots.

example:

2024-04-02 19:33:59 +0200 mxqd[118828]: job finished (via fspool) : job 50596128 pid 37376 status 512
2024-04-02 19:34:50 +0200 mxqd[118828]: WARNING: mx_mysql_do_statement() failed: Invalid exchange
2024-04-02 19:34:50 +0200 mxqd[118828]: ERROR: mx_mysql_do_statement(): Lock wait timeout exceeded; try restarting transaction
2024-04-02 19:34:50 +0200 mxqd[118828]: Main loop freed 196 slots.
@donald donald changed the title Failed to update jobs from cancelled groups in database Lock wait timeout exceeded; try restarting transaction Apr 26, 2024
@donald
Copy link
Contributor Author

donald commented Apr 26, 2024

Maybe its really just a timeout in unusual situations like cancelling huge groups and not a logic bug.

mysql> SELECT @@GLOBAL.innodb_lock_wait_timeout;
+-----------------------------------+
| @@GLOBAL.innodb_lock_wait_timeout |
+-----------------------------------+
|                                50 |
+-----------------------------------+
1 row in set (0.00 sec)

I've increased it to 15 minutes and restarted all mxqd daemons to pick up the new default value in new sessions.

mxqprj@macheteinfach:~> mkdir -p /project/mxq/mysql-5.6.14/etc
mxqprj@macheteinfach:~> echo -e "[server]\ninnodb-lock-wait-timeout=660" >> /project/mxq/mysql-5.6.14/etc/my.cnf

mysql> SET @@GLOBAL.innodb_lock_wait_timeout = 550;

root@afk:~# pssh -H "$(hostconfig --list mxqd)" mxqdctl-hostconfig restart

@donald
Copy link
Contributor Author

donald commented Apr 26, 2024

Restart failed on galah (only). Never seen this before. Anyway, I've just started the daemon and now its running.

2024-04-26 14:25:23 +0200 mxqd[3404]: received sigusr2
2024-04-26 14:25:23 +0200 mxqd[3404]: global_sigint_cnt=0 global_sigterm_cnt=0 global_sigquit_cnt=0 global_sigrestart_cnt=1: Exiting.
2024-04-26 14:25:23 +0200 mxqd[3404]: -------------------------------------------------------------
2024-04-26 14:25:23 +0200 mxqd[3404]:  Reexecuting mxqd
2024-04-26 14:25:23 +0200 mxqd[3404]: -------------------------------------------------------------
2024-04-26 14:25:23 +0200 mxqd[3404]: WARNING: Welcome back. Server is restarting. Ignoring some options by default now.
2024-04-26 14:25:23 +0200 mxqd[3404]: mxqd - MXQ v0.31.2 (beta)
2024-04-26 14:25:23 +0200 mxqd[3404]:   by Marius Tolzmann <marius.tolzmann@molgen.mpg.de> 2013-2022
2024-04-26 14:25:23 +0200 mxqd[3404]:      and Donald Buczek <buczek@molgen.mpg.de> 2015-2022
2024-04-26 14:25:23 +0200 mxqd[3404]:   Max Planck Institute for Molecular Genetics - Berlin Dahlem
2024-04-26 14:25:23 +0200 mxqd[3404]: hostname=galah.molgen.mpg.de daemon_name=main daemon_id=12365 :: MXQ server started.
2024-04-26 14:25:23 +0200 mxqd[3404]:   host_id=afd712f3-111d-41ab-af06-44fb499a8ee5-e864-d4c
2024-04-26 14:25:23 +0200 mxqd[3404]: slots=256 memory_total=1048576 memory_avg_per_slot=4096 memory_limit_slot_soft=1048576 memory_limit_slot_hard=1048576 :: server initialized.
2024-04-26 14:25:23 +0200 mxqd[3404]: cpu set available: [0-255]
2024-04-26 14:25:23 +0200 mxqd[3404]: WARNING: mx_mysql_do_statement() failed: Invalid exchange
2024-04-26 14:25:23 +0200 mxqd[3404]: ERROR: mx_mysql_do_statement(): Deadlock found when trying to get lock; try restarting transaction
2024-04-26 14:25:23 +0200 mxqd[3404]: mxq_daemon_mark_crashed() failed: Connection timed out
2024-04-26 14:25:23 +0200 mxqd[3404]: WARNING: recover_from_previous_crash() failed. Aborting execution.
2024-04-26 14:25:23 +0200 mxqd[3404]: entering main loop
2024-04-26 14:25:23 +0200 mxqd[3404]: global_sigint_cnt=0 global_sigterm_cnt=0 global_sigquit_cnt=0 global_sigrestart_cnt=0: Exiting.
2024-04-26 14:25:23 +0200 mxqd[3404]: cu, mx.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant