umount() of tempdir sometimes needs too long #121

donald · 2022-02-06T20:11:33Z

I've seen mxqd hanging for tens of seconds on the umount() of a tempdir after a job finished. Probably writeback of dirty data. In the initial experiments, this wasn't a problem. Maybe something changed in the kernel?

Maybe we can use dm and replace dm-linear with dm-zero before the umount?

donald · 2022-04-08T20:57:20Z

Killed a group with the jobs running on the same node. The jobs copied about 30G to TMPDIR. The umount of each job took several minutes, so mxqd was frozen for over 20 minutes. This is a huge problem.

http://afk.molgen.mpg.de/mxq/mxq/group/496259

Stack of a "loopN" thread:

[<0>] blk_mq_get_tag+0x161/0x280
[<0>] __blk_mq_alloc_request+0x5c/0x130
[<0>] blk_mq_submit_bio+0x103/0x510
[<0>] submit_bio_noacct+0x3c2/0x410
[<0>] submit_bio+0x4b/0x1a0
[<0>] iomap_do_writepage+0x3db/0x870
[<0>] write_cache_pages+0x175/0x3f0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x59/0x80
[<0>] do_writepages+0x4b/0xe0
[<0>] __filemap_fdatawrite_range+0xc7/0x100
[<0>] file_write_and_wait_range+0x5e/0xb0
[<0>] xfs_file_fsync+0x6c/0x220
[<0>] loop_queue_work+0x4b7/0xa20
[<0>] kthread_worker_fn+0x97/0x1a0
[<0>] kthread+0x118/0x130
[<0>] ret_from_fork+0x22/0x30

donald · 2022-04-09T16:10:25Z

/usr/local/system/perl/bin/perl -e "open F, '<', '$mountpoint/.' ; \$d=pack('L', 2) ; ioctl (F, 2147768445, \$d)", which is EXT4_IOC_SHUTDOWN _IOR and EXT4_GOING_FLAGS_NOLOGFLUSH, before the umount doesn't help much.

We can put a device mapper with a linear target before the umount and use dmsetup remove --force --deferred $dmname before the umount to fail all further I/O. While this gets fast if the device is idle and we only do a umount, it still gets slow if writeback is already in progress. Probably not because if I/O but because of writeback error handling. Additionally it leaves a lot of noise in the logs:

2540423.243623] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449816)
[2540423.259873] buffer_io_error: 32473 callbacks suppressed
[2540423.259873] Buffer I/O error on device dm-0, logical block 2449816
[2540423.271821] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449817)
[2540423.282642] Buffer I/O error on device dm-0, logical block 2449817
[2540423.289179] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449818)
[2540423.299998] Buffer I/O error on device dm-0, logical block 2449818
[2540423.306535] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449819)
[2540423.317361] Buffer I/O error on device dm-0, logical block 2449819
[2540423.323887] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449820)
[2540423.334706] Buffer I/O error on device dm-0, logical block 2449820
[2540423.341245] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449821)
[2540423.352060] Buffer I/O error on device dm-0, logical block 2449821
[2540423.358605] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449822)
[2540423.369424] Buffer I/O error on device dm-0, logical block 2449822
[2540423.375959] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449823)
[2540423.386775] Buffer I/O error on device dm-0, logical block 2449823
[2540423.393312] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449824)
[2540423.404131] Buffer I/O error on device dm-0, logical block 2449824
[2540423.410668] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449825)
[2540423.421484] Buffer I/O error on device dm-0, logical block 2449825
[2540423.799393] Buffer I/O error on dev dm-0, logical block 1, lost async page write
[2540423.799446] Buffer I/O error on dev dm-0, logical block 2, lost async page write
[2540423.814628] Buffer I/O error on dev dm-0, logical block 1027, lost async page write
[2540423.822499] Buffer I/O error on dev dm-0, logical block 1059, lost async page write
[2540423.830368] Buffer I/O error on dev dm-0, logical block 33795, lost async page write
[2540423.838321] Buffer I/O error on dev dm-0, logical block 33796, lost async page write
[2540423.846276] Buffer I/O error on dev dm-0, logical block 524288, lost async page write
[2540423.854319] Buffer I/O error on dev dm-0, logical block 1048576, lost async page write
[2540423.862448] Buffer I/O error on dev dm-0, logical block 1572864, lost async page write
[2540423.870573] Buffer I/O error on dev dm-0, logical block 1572868, lost async page write
[2540424.812115] EXT4-fs (dm-0): I/O error while writing superblock

I don't think we want that for every job.

donald · 2022-04-09T16:56:04Z

To replace the table with a zero target seems to be fast always.

	mountpoint=$mntdir/$MXQ_JOBID
	dmname=mxq.$MXQ_JOBID
	dmdevice=/dev/mapper/$dmname
	minor=$(( 0x$(stat -c%T $dmdevice) ))
	sectors=$(cat /sys/block/dm-$minor/size)

	dmsetup reload $dmname --table "0 $sectors zero"
	dmsetup resume $dmname --noflush --nolockfs
	umount $mountpoint
	dmsetup remove $dmname

However, we still get junk in the logs:

[2541751.618340] EXT4-fs (dm-1): Delayed block allocation failed for inode 11 at logical offset 979968 with max blocks 2048 with error 117
[2541751.630585] EXT4-fs (dm-1): This should not happen!! Data will be lost
[2541752.315139] EXT4-fs (dm-1): Inode 11 (00000000d37ce04b): i_reserved_data_blocks (3073) not cleared!
[2543662.042686] EXT4-fs error (device dm-1): ext4_validate_block_bitmap:390: comm kworker/u24:4: bg 32: bad block bitmap checksum
[2543662.054248] EXT4-fs (dm-1): Delayed block allocation failed for inode 12 at logical offset 980992 with max blocks 2048 with error 74
[2543662.066404] EXT4-fs (dm-1): This should not happen!! Data will be lost

[2543663.486516] EXT4-fs (dm-1): Inode 12 (0000000052b36060): i_reserved_data_blocks (2049) not cleared!

donald · 2022-04-09T18:48:39Z

Combination of EXT4_IOC_SHUTDOWN and zero target does work,

donald · 2022-04-09T20:29:32Z

Sometimes we get ext4_writepages: jbd2_start: 173 pages, ino 11; err -5. And the (expected) shut down requested (2) each time. Maybe we'd need to disable that message.

[2555338.047021] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555432.766098] EXT4-fs (dm-2): shut down requested (2)
[2555439.994342] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2555459.506944] EXT4-fs (dm-1): shut down requested (2)
[2555459.651564] EXT4-fs (dm-1): ext4_writepages: jbd2_start: 3385 pages, ino 11; err -5
[2555468.777055] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2555477.306924] EXT4-fs (dm-3): shut down requested (2)
[2555482.810599] EXT4-fs (dm-3): ext4_writepages: jbd2_start: 4256 pages, ino 11; err -5
[2555483.752750] EXT4-fs (dm-0): shut down requested (2)
[2555483.952197] EXT4-fs (dm-0): ext4_writepages: jbd2_start: 4425 pages, ino 11; err -5
[2555487.801428] EXT4-fs (dm-4): mounted filesystem without journal. Opts: (null)
[2555492.201764] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555575.292305] EXT4-fs (dm-2): shut down requested (2)
[2555580.032793] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2555611.479816] EXT4-fs (dm-1): shut down requested (2)
[2555618.297560] EXT4-fs (dm-0): shut down requested (2)
[2555619.265264] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2555620.597076] EXT4-fs (dm-4): shut down requested (2)
[2555622.181716] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555624.148530] EXT4-fs (dm-4): mounted filesystem without journal. Opts: (null)
[2555631.869127] EXT4-fs (dm-2): shut down requested (2)

Of course, we could just lazy umount to get mxqd going. But we'd still have the writeback I/O load, which we don't want.

donald changed the title ~~umount() of tempdir sometimes need to long~~ umount() of tempdir sometimes need too long Feb 6, 2022

donald changed the title ~~umount() of tempdir sometimes need too long~~ umount() of tempdir sometimes needs too long Feb 6, 2022

donald mentioned this issue May 11, 2022

0.30.6 #133

Merged

donald closed this as completed in #133 May 11, 2022

umount() of tempdir sometimes needs too long #121

umount() of tempdir sometimes needs too long #121

donald commented Feb 6, 2022

donald commented Apr 8, 2022 •

edited

Loading

donald commented Apr 9, 2022

donald commented Apr 9, 2022

donald commented Apr 9, 2022

donald commented Apr 9, 2022

umount() of tempdir sometimes needs too long #121

umount() of tempdir sometimes needs too long #121

Comments

donald commented Feb 6, 2022

donald commented Apr 8, 2022 • edited Loading

donald commented Apr 9, 2022

donald commented Apr 9, 2022

donald commented Apr 9, 2022

donald commented Apr 9, 2022

donald commented Apr 8, 2022 •

edited

Loading