Skip to content

Commit

Permalink
mxqd: Change poll times
Browse files Browse the repository at this point in the history
Use 10 seconds everywhere to decrease the load on the database
and races between the mxq daemons a bit. At the same time this increases
the chance that multiple jobs of the same group are started on the same
server, which is good (better use of caches, smaller failure surface).

This is the maximum time a single server will need to react to
database changes (mxqsub or mxqkill).

Administrative signals will get immediate reaction.

Finished user jobs will usually also get immediate reaction.
However, this is not true for jobs we picked up from a previous
daemon incarnation and which are not our children. If these jobs
finish, we will not get a signal, so we need to look into the
spool directory from time to time.  This is another reason, why
we need a timeout at all.

Now that we want to use 10 seconds everywhere, we can make it a
constant.
  • Loading branch information
donald committed Jul 6, 2017
1 parent 82d7a31 commit 117c12b
Showing 1 changed file with 1 addition and 7 deletions.
8 changes: 1 addition & 7 deletions mxqd.c
Original file line number Diff line number Diff line change
Expand Up @@ -2338,7 +2338,7 @@ int main(int argc, char *argv[])

int res;
int fail = 0;
struct timespec poll_interval={0,0};
static struct timespec poll_interval={10,0}; /* 10 seconds */
siginfo_t siginfo;

int saved_argc;
Expand Down Expand Up @@ -2427,7 +2427,6 @@ int main(int argc, char *argv[])
assert(!group_cnt);
mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_IDLE);
mx_log_debug("Nothing to do");
poll_interval.tv_sec=1;
continue;
}

Expand All @@ -2440,7 +2439,6 @@ int main(int argc, char *argv[])
mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_FULL);
}
mx_log_debug("All slots running");
poll_interval.tv_sec=7;
continue;
}

Expand All @@ -2456,19 +2454,16 @@ int main(int argc, char *argv[])
if (res<0) {
mx_log_info("No more slots started because we have users waiting for free slots");
mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_WAITING);
poll_interval.tv_sec=3;
continue;
}

if (!slots_started && !slots_returned && !global_sigint_cnt && !global_sigterm_cnt) {
if (!server->jobs_running) {
mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_IDLE);
mx_log_debug("Tried Hard and nobody is doing anything.");
poll_interval.tv_sec=15;
} else {
mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_RUNNING);
mx_log_debug("Tried Hard. But have done nothing.");
poll_interval.tv_sec=3;
}
continue;
}
Expand All @@ -2484,7 +2479,6 @@ int main(int argc, char *argv[])
/* while not quitting and not restarting -> wait for and collect all running jobs */

mxq_daemon_set_status(server->mysql, daemon, MXQ_DAEMON_STATUS_TERMINATING);
poll_interval.tv_sec=1;
while (server->jobs_running && !global_sigquit_cnt && !global_sigrestart_cnt && !fail) {
slots_returned = catchall(server);
slots_returned += fspool_scan(server);
Expand Down

0 comments on commit 117c12b

Please sign in to comment.