Skip to content

0.30.8 #137

Merged
merged 4 commits into from Oct 28, 2022
Merged

0.30.8 #137

merged 4 commits into from Oct 28, 2022

Conversation

donald
Copy link
Contributor

@donald donald commented Oct 12, 2022

  • Use Device Mapper to avoid flushing of TMPDIR when a job is finished. Kernel should contain mariux64/linux@b9552111fc6
    to avoid extensive logging.

This helper has been replaced by tmpdir-setup. Remove it.
@donald
Copy link
Contributor Author

donald commented Oct 12, 2022

Being tested on esodophobie

@pmenzel
Copy link
Contributor

pmenzel commented Oct 28, 2022

Minor typos: realier → earlier; to cleanup → to clean up

@donald donald force-pushed the 0.30.8 branch 3 times, most recently from 1d11d14 to 4c052e2 Compare October 28, 2022 09:38
When mxqd restarts and finds finished jobs, it calls the tmpdir cleanup
code for these jobs.

As part of the recovery procedure, it later scans the system for any
leftover mounts. When the regular tmpdir cleanup is done asynchronously,
mxqd might discover a directory which is in the progress of being
dismounted but still exists in which case it calls the tmpdir
cleanup code a second time.

There is no harm done, the jobs completed normally. The second
attempted cleanup just produces some error messages in the logfile

This bug is only triggered when jobs complete while mxqd is stopped.

As the "old style" tmpdir setup is going away anyway, don't invent
something complicated here and just do the cleanup synchronously.
Use a dm-device (linear target) between the filesystem and the loop
device and then use this sequence for teardown:

- fcntl EXT4_IOC_SHUTDOWN with EXT4_GOING_FLAGS_NOLOGFLUSH
- dmestup reload $dmname --table "0 $sectors zero"
- dmsetup resume $dmname --noflush
- umount $mountpoint
- dmsetup remove --deferred $dmname
- rmdir $mountpoint

The zero target prevents any real writes to the block device. However,
if the filesystems reads back some data, it will get zeros, which could
lead to all kinds of random behaviour. For this reason, we shut down the
filesystem, which has the additional advantage, that some I/O is
prevented in an even ealier stage. Shutdown alone, however, would not
prevent all I/O (e.g. not cache writeback or superblock write), so we
still need the zero target.

Even with this setting, ext4 sometimes logs some errors
("ext4_writepages: jbd2_start: XXX pages, ino YYY; err -5").

We've patched our kernel to avoid that message if the filesystem is shut
down. This goes on top of the patches which avoid the usual "mounted"
and "unmounted" messages for ext4.

To support rolling upgrades of mxqd, keep support to clean up mounts
created the old way, which is to mount a loop device directly.
@donald donald merged commit 4ea76f5 into master Oct 28, 2022
donald added a commit that referenced this pull request Oct 28, 2022
@donald
Copy link
Contributor Author

donald commented Feb 12, 2023

Note: This PR references 2c29bb3 which has a bug (overlooked git conflict marker). What was really merged was 9b43066 with the marker removed.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants