Skip to content

0.31.0 #149

Merged
merged 20 commits into from Dec 29, 2023
Merged

0.31.0 #149

merged 20 commits into from Dec 29, 2023

Conversation

donald
Copy link
Contributor

@donald donald commented Dec 27, 2023

Implements mxqkill -J xxx to kill individual jobs.

Fixes #4

@donald donald force-pushed the 0.31.0 branch 12 times, most recently from adcbf7a to 7738f49 Compare December 28, 2023 17:40
mx_mysql.c Outdated Show resolved Hide resolved
mx_mysql.c Show resolved Hide resolved
mx_mysql.c Outdated Show resolved Hide resolved
mx_mysql.c Show resolved Hide resolved
mx_mysql.c Outdated Show resolved Hide resolved
mx_mysql.c Outdated Show resolved Hide resolved
mx_mysql.c Show resolved Hide resolved
Copy link
Contributor Author

@donald donald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parenthesis an semicolon missing in assert example in commit message of "Remove broken errno range check".

@donald donald force-pushed the 0.31.0 branch 2 times, most recently from 9cc768c to 0023f18 Compare December 29, 2023 09:59
Remove group_sum_starttime due to its unclear purpose and predominantly
zero values.

'select count(*) from mxq_group where group_sum_starttime=0'
+----------+
| count(*) |
+----------+
|   539223 |
+----------+
'select count(*) from mxq_group where group_sum_starttime!=0'
+----------+
| count(*) |
+----------+
|    11014 |
+----------+
'select distinct(group_sum_starttime) from mxq_group limit 10'
+---------------------+
| group_sum_starttime |
+---------------------+
|                   0 |
|         62509173560 |
|        682762450608 |
|         50573290960 |
|        126435008048 |
|        328732081280 |
|         75859753168 |
|         75859750064 |
|       1770065450784 |
|         75861674832 |
+---------------------+
The current error handling in mx_mysql logs errors multiple times in
different forms due to nested functions and the mapping of MySQL error
codes to system error codes. This leads to unclear semantics and
complicated code.

The current error handling return the same error in errno and in the
function return value which results in uneeded accesses to errno

This commit introduces the following changes:

*   MySQL error strings are captured but not logged by the module. A new
    function mx_mysql_error() is provided for the caller to retrieve the
    latest error string.
*   System error codes are returned as the function value, but errno is
    not set.
*   It is now the caller's responsibility to log errors.

This change reduces the object size of mx_mysql by 28%. Future commits
will address error return values that do not relate to MySQL error
strings.
The assertion

    mx_assert_return_minus_errno((unsigned int)(int)error == error, ERANGE);

does not function as intended, as the conditional expression is
optimized away to true by gcc with -O1 and up [1].

While there are alternatives to detect values outside of the range
[0..-INT_MIN] that could not be negated into the range [INT_MIN..0], the
check appears to be unnecessary.  All callers use the MySQL error numbe
for comparison with constants.

This commit removes the helper functions `mx__mysql_errno()` and
`mx__mysql_stmt_errno()` and replaces them with native MySQL calls.

[1]: https://godbolt.org/z/1ajK384Ge
If the file is not readable, the warning is now saved to be retrievable
by the caller instead of being emitted. The warning text has been
shortened from:

    MySQL ignoring defaults file: euidaccess("/etc/mxq/mysql_ro.cnfX", R_OK) failed: No such file or directory
    MySQL falling back to mysql default config search path.

to

    /etc/mxq/mysql_ro.cnfX: No such file or directory - falling back to mysql default config search path
`mx_mysql` no longer delivers errors in `errno`, so this commit changes
the callers to not use `errno` after a failed call to a `mx_mysql` function,
but only the return value. The "%m" format for getting the error message
is replaced with "%s" in conjunction with `mx_mysql_error()`.

This commit also removes some redundant error logging from nested
function calls.
Currently, `mx_mysql` uses its own assert macros, which log the failed
condition and then return an error value to the caller (via return
status and via errno).

This commit changes the approach to use the standard `assert()` to abort
right away in case of failures, as all `assert()` failures are serious
coding errors. This simplifies the coding for the callers.
Following the last commit, the bind functions and macros of `mx_mysql`
can no longer return a failure status. This commit changes them to void
functions, simplifying the callers.
When mxqkill is used against a running job, it needs to communicate the
fact that the job should be aborted to the daemon handling it. This
commit adds a boolean (TINYINT) column `job_cancelled` to `mxq_job` for
this purpose.

Other options such as overloading `job_status`, utilizing a bit of
job_flags, and using MySQL BIT or SET data types were considered but
discarded to avoid code complexity.
To cancel a single job, this commit introduces the following changes:

* The owner uid is verified in advance to avoid a join withi `mxq_group`
  during a later update statement. This is because the join during
  update doesn't work well with the triggers that want to modify
  `mxq_group.. By avoiding this, we can later dispose of the CANCELLING
  state, which is also just a workaround for this problem.

* job_cancelled is set to true for the job, regardless of job_status.

Transitions of job_status will be handled by following commits.
Use the `cancel_job()`function introduced in the previous commit to
cancel a single job.
This commit introduces periodic checks for the job_cancelled flags for
the jobs run by this daemon. If a cancellation of the job was requested,
the job is killed using the existing mechanism.
This commit removes the -> `CANCELLED` housekeeping from the
`mx_update_job` trigger as we no longer set the state to `CANCELLED` by
code.

Instead, a new trigger is added, which transitions a job from `INQ` to
`CANCELLED` when the `job_cancelled` flag is set.

For jobs that are `RUNNING`, the `job_cancelled` flag needs to be
handled by the daemon, which kills the job. Note that the job will
transition into `KILLED` not into `CANCELLED`.

This approach eliminates the need for any code for jobs getting killed
in the `LOADED` or `ASSIGNED` states, as these will either transition
into `RUNNING` or - in very exceptional cases - back into INQ.
This commit removes the obsolete job status `CANCELLING`. It also
removes the `* -> NOT IN [ CANCELLING(989) | CANCELLED(990) ]` trigger,
which would reduce to `* -> CANCELLED(990)`, but the only transition to
`CANCELLED` is from `INQ` where there are no job run time statistics
that need to be accumulated into mxq_group.
@donald
Copy link
Contributor Author

donald commented Dec 29, 2023

Old (5.6.14) MySQL server doesn't accept the new trigger which has been tested with MySQL 8.

ERROR 1064 (42000) at line 5: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'mxq_update_job
        IF NEW.job_status = 0 AND NEW.job_cancelled THEN
     ' at line 2

Relevant change maybe in 5.7.2.

https://dev.mysql.com/blog-archive/support-for-multiple-triggers-per-table-for-the-same-value-of-actiontiming/

@donald
Copy link
Contributor Author

donald commented Dec 29, 2023

Workaround by commit a6a1cec ("sql: Change trigger syntax for MySQL 5 database ") directly on master branch.

mxq_daemon.c Show resolved Hide resolved
@thomas thomas mentioned this pull request Dec 30, 2023
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mxqkill: add feature to kill a single job by job id
1 participant