-
Notifications
You must be signed in to change notification settings - Fork 3
Better handle terminating jobs when /var/spool is full #136
Comments
maybe out of free disk space
|
Exactly
mxqd is not prepared for that. Suggestion: Remove the empty stat files and restart the daemon. ( |
I removed the files.
Let’s see, if mxqd detects the jobs are gone without a restart. |
ERROR: /var/spool/mxqd/main/40881611.stat : parse error (res=-1)
Should the reaper wait and retry on write failure? |
sorry, but l do not understand what you are talking about ...
…On December 30, 2023 1:39:02 PM GMT+01:00, Donald Buczek ***@***.***> wrote:
Should the reaper wait and retry on write failure?
|
Every running mxq job has a privileged process ("mxq reaper") as its top process. 1. It just reaps the user processes until no more are left and writes the exit status of the main process and the resource usages into a spool file 2. This issue reported here was triggered when the root filesystem was full, the reaper consequently produced an empty spool file, the mxq daemon complained about the unexpected format and wasn't able to finish the jobs. My suggestion was that the reaper process, if it finds itself unable to write the spool file, just waits and retries for ever. In the reported case, that would have helped. |
Despite a user cancelling/killing a job with
the job 40881611 is still shown as running. No process from that user runs on superbia anymore.
The text was updated successfully, but these errors were encountered: