Skip to content

Chaotic Chameleon.1 #123

Merged
merged 30 commits into from
Feb 21, 2022
Merged

Chaotic Chameleon.1 #123

merged 30 commits into from
Feb 21, 2022

Conversation

donald
Copy link
Contributor

@donald donald commented Feb 17, 2022

No description provided.

@pmenzel
Copy link
Contributor

pmenzel commented Feb 17, 2022

s/rases/races/ in commit message. Rest looks good.

@donald
Copy link
Contributor Author

donald commented Feb 17, 2022

For testing already installed on theredqueen and deinemuddah.

@donald donald force-pushed the fixes branch 7 times, most recently from 1870a2f to 84d0a1b Compare February 20, 2022 17:44
Currently the script assumes that it is not called multiple times in
parallel. This is not true, because for `gpu-setup job-init` it is
called by the forked user process.

Use flock to avoid GPU allocation races.
@donald donald force-pushed the fixes branch 3 times, most recently from 84a9752 to 7952f85 Compare February 21, 2022 09:50
Because `gpu-setup job-init` is asynchronous to mxqd (see previous
commit), `gpu-setup job-relaese` races with it, too.

Tolerate empty pid files as well as a pid files being removed away from
under us.
This message is rather annoying during normal operation, so set it to
debug level.
If the -u option is not given, passwd will not be set in the args
evaluation loop, yet we use `if (!passwd) {` later in the code.

Initialze pointer.
If the -u option is not given, passwd will not be set in the args
evaluation loop, yet we use `if (!passwd) {` later in the code.

Initialze pointer.
If pid 1 is not found in the loop, `pid1` will be left unititialized,
yet it is evaluated in `if (!pid1)` after the loop.

Add missing initialization.
Remove "inline" function attributes and leave the decision to the
compiler.

Note: We keep "inline" in the cleanup functions in mx_util.h to avoid
conflicts between instances of the functions bodies in multiple
compilation units.
Remove "extern" keyword from functions, because it has no effect.
Turn some functions, which are only used inside a compilation unit, into
static. If the funtions was published in the include file, remove it
there.
Don't call _flock_free before flock->fname has been initialized.
This feature was once added to help in development. However since then,
mxqd has learned a few more more things (like mounting TMPDIR) which can
not be done without privileges and would require more special code.

Even if we special-case everywhere, testing done as non-root wouldn't
exercise code paths of the real thing.

I don't really use that feature anymore. Does anyone?

Moreover, nowadays there are other options (like user namespaces).

Remove feature to simplify code.
If an external helper does something unexpected like exiting with a
non-zero exit status or sending more data than mxqd wants to handle,
mx_call_external sets errno to an error code.

Currenty we use EPIPE ("Broken pipe"). Change this to EPROTO ("Protocol
error"), which seems to be better fitting.

We assume that the external helper, which shares its stderr with mxqd,
has already sent some diagnostic to the logfile.
Use `doxygen -u` to upgrade `Doxyfile`.
In the "reaper died" codepath, the exit status has not yet read when
the error message is emitted. Move error message behind the wait4 call.
Capture jlist->next because jlist might be freed by job_is_lost() inside
the loop body.
Currently argv is converted to a string and back to an array before mxqd
reexecs itself. Use the original argv instead.

Difference is, that options are not included in the informational
messages.
@donald donald changed the title Fixes Chaotic Chameleon.1 Feb 21, 2022
@donald donald merged commit a4b247d into master Feb 21, 2022
@donald donald deleted the fixes branch October 28, 2022 14:21
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants