Skip to content

Work #63

Merged
merged 20 commits into from Jul 6, 2017
Merged

Work #63

merged 20 commits into from Jul 6, 2017

Conversation

donald
Copy link
Contributor

@donald donald commented Jul 6, 2017

No description provided.

donald added 20 commits July 4, 2017 09:36
Add extra parentheses around assigments evaluated as conditions as
suggested by -Wparentheses.

This is to avoid

    CC test_mx_util.o
    test_mx_util.c: In function ‘test_mx_strskipwhitespaces’:
    test_mx_util.c:17:4: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
	assert(s = mx_strskipwhitespaces("     abc   "));
	^
    test_mx_util.c:20:4: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
	assert(s = mx_strskipwhitespaces("abc   "));
	^
    test_mx_util.c:23:4: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
	assert(s = mx_strskipwhitespaces(""));
	^
    test_mx_util.c: In function ‘test_mx_strscan’:
    test_mx_util.c:311:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
	 assert(s = strdup("123 456 -789 246 abc"));
	 ^
etc.
This is a general purpose routine to sort a single linked list of
objects. It requires callbacks for comparison and for locating
the next pointer in the object.
Add a utility routing to sort the groups of all users of the server to
their group priority.
When active groups are (re-)loaded from the database, we sort them by
group priority. This implements group priority (mxqsub -P).
We intend to expand the functionality which can be triggered
by signals, e.g. switch log level to debug or trigger a state
dump.

By processing signals synchronously, we are free to call
non-reentrant functions and know, that our own data is not
in a transient state.

Block asynchronous signals and receive and process signals explicitly.

Re-enable the asynchronous signals when the user process in initialized.
Now hat all signals are blocked, we don't need to ignore them.
It is no longer necessary to reset signals, because we no longer
change them.

The signals may stay blocked for the reaper process and are explicitly
unblocked for the user process.
We process signals synchronously now. Change "volatile sig_atomic_t" to
"int" type.
A dump of server state can be triggerd by

  env kill -usr2 -q 10 mxqd
The loglevel may be set to info with

    env kill -usr2 -q 21 mxqd

and to info with

    env kill -usr2 -q 20 mxqd
We intent to make loggin on info level less verbose so that we don'T
need the deduplication here.

At debug level we want to see all events as they happen.
Try to start as many jobs jobs as possible without going through a main
loop iteration (database roundtrip).
Use 10 seconds everywhere to decrease the load on the database
and races between the mxq daemons a bit. At the same time this increases
the chance that multiple jobs of the same group are started on the same
server, which is good (better use of caches, smaller failure surface).

This is the maximum time a single server will need to react to
database changes (mxqsub or mxqkill).

Administrative signals will get immediate reaction.

Finished user jobs will usually also get immediate reaction.
However, this is not true for jobs we picked up from a previous
daemon incarnation and which are not our children. If these jobs
finish, we will not get a signal, so we need to look into the
spool directory from time to time.  This is another reason, why
we need a timeout at all.

Now that we want to use 10 seconds everywhere, we can make it a
constant.
We don't want spurious signals (like SIGWINCH) to trigger a new
evaluation of the main loop. Only wait for signals we want to be
processed.
Add commands

    killall quitall reloadall reloadall (=restartall)
    dumpall setdebugall setinfoall (=setnodebugall)
The server status if multiple times set from the main loop to
IDLE,RUNNING,BACKFILL,CPUOPTIMAL or FULL based on server->slots_running,
server->slots, server->threads_running.

Factor this out into a separate function to avoid repetition.

Add a call to this function before the main loop so that we immediately
get a correct status after a server start.

Simplify the loop code by setting server status at end of loop and avoid
the continue pattern. As a side effect this also resolves a bug, that
the server status was not updated in a loop iteration where new jobs
were started.
This is just to avoid valgrind leak warnings. We have two variables with
cleanup attribute in the scope of main, which are not freed, when we
leave by exit() instead of running out of scope.

    ==10696== 21 bytes in 1 blocks are still reachable in loss record 1 of 3
    ==10696==    at 0x4C29AC6: malloc (vg_replace_malloc.c:299)
    ==10696==    by 0x5E19449: strdup (strdup.c:42)
    ==10696==    by 0x5E750DD: get_current_dir_name (getdirname.c:40)
    ==10696==    by 0x409F64: main (mxqd.c:2383)
    ==10696==
    ==10696== 45 bytes in 1 blocks are still reachable in loss record 2 of 3
    ==10696==    at 0x4C29AC6: malloc (vg_replace_malloc.c:299)
    ==10696==    by 0x40CFF9: mx_strvec_to_str (mx_util.c:1059)
    ==10696==    by 0x409F58: main (mxqd.c:2382)
@donald donald merged commit b5fe300 into mariux64:mpi Jul 6, 2017
donald added a commit that referenced this pull request Mar 20, 2018
The mysql client library sets SIGPIPE to SIG_IGN which gets inherited by
the user processes:

    root@sigchld:~# cat /proc/27900/status |grep Sig
    SigQ:   0/30656
    SigPnd: 0000000000000000
    SigBlk: 0000000000000000
    SigIgn: 0000000000001000
    SigCgt: 0000000000000000

This makes a difference for things like

     CMD | head -1

because CMD is not killed by SIGPIPE when `head` is finished. It will get
an error "broken pipe" on the next write and may or may not emit this to
stderr.

This bug war partially caused by #63, where we changed our own signal
handling to synchronous mode. While in the previous mode, we unknowingly
fixed the change done by the mysql library, we then just set and
reset the blocking mask, which doesn't restore an ignored signal.

Restore SIGCHLD to SIG_DFL when we initialize the child process for the
user.

With this patch applied, no more signals are blocked for the user
process.

    buczek@theinternet:~/git/mxq (restore-sigpipe)$ cat /proc/17081/status |grep Sig
    SigQ:   1/127301
    SigPnd: 0000000000000000
    SigBlk: 0000000000000000
    SigIgn: 0000000000000000
    SigCgt: 0000000000000000

Fixes #71
@donald donald mentioned this pull request Mar 20, 2018
pmenzel pushed a commit that referenced this pull request Mar 20, 2018
The mysql client library sets SIGPIPE to SIG_IGN which gets inherited by
the user processes:

    root@sigchld:~# cat /proc/27900/status |grep Sig
    SigQ:   0/30656
    SigPnd: 0000000000000000
    SigBlk: 0000000000000000
    SigIgn: 0000000000001000
    SigCgt: 0000000000000000

This makes a difference for things like

     CMD | head -1

because CMD is not killed by SIGPIPE when `head` is finished. It will get
an error "broken pipe" on the next write and may or may not emit this to
stderr.

This bug was partially caused by #63 (commit ff2f49f (mxqd: Use
synchronous signals)), where we changed our own signal handling to
synchronous mode. While in the previous mode, we unknowingly fixed the
change done by the mysql library, we then just set and reset the blocking
mask, which doesn't restore an ignored signal.

Restore SIGCHLD to SIG_DFL when we initialize the child process for the
user.

With this patch applied, no more signals are blocked for the user
process.

    buczek@theinternet:~/git/mxq (restore-sigpipe)$ cat /proc/17081/status |grep Sig
    SigQ:   1/127301
    SigPnd: 0000000000000000
    SigBlk: 0000000000000000
    SigIgn: 0000000000000000
    SigCgt: 0000000000000000

Fixes: #71
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant