Skip to content

umount() of tempdir sometimes needs too long #121

Closed
donald opened this issue Feb 6, 2022 · 5 comments · Fixed by #133
Closed

umount() of tempdir sometimes needs too long #121

donald opened this issue Feb 6, 2022 · 5 comments · Fixed by #133

Comments

@donald
Copy link
Contributor

donald commented Feb 6, 2022

I've seen mxqd hanging for tens of seconds on the umount() of a tempdir after a job finished. Probably writeback of dirty data. In the initial experiments, this wasn't a problem. Maybe something changed in the kernel?

Maybe we can use dm and replace dm-linear with dm-zero before the umount?

@donald donald changed the title umount() of tempdir sometimes need to long umount() of tempdir sometimes need too long Feb 6, 2022
@donald donald changed the title umount() of tempdir sometimes need too long umount() of tempdir sometimes needs too long Feb 6, 2022
@donald
Copy link
Contributor Author

donald commented Apr 8, 2022

Killed a group with the jobs running on the same node. The jobs copied about 30G to TMPDIR. The umount of each job took several minutes, so mxqd was frozen for over 20 minutes. This is a huge problem.

http://afk.molgen.mpg.de/mxq/mxq/group/496259

Stack of a "loopN" thread:

[<0>] blk_mq_get_tag+0x161/0x280
[<0>] __blk_mq_alloc_request+0x5c/0x130
[<0>] blk_mq_submit_bio+0x103/0x510
[<0>] submit_bio_noacct+0x3c2/0x410
[<0>] submit_bio+0x4b/0x1a0
[<0>] iomap_do_writepage+0x3db/0x870
[<0>] write_cache_pages+0x175/0x3f0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x59/0x80
[<0>] do_writepages+0x4b/0xe0
[<0>] __filemap_fdatawrite_range+0xc7/0x100
[<0>] file_write_and_wait_range+0x5e/0xb0
[<0>] xfs_file_fsync+0x6c/0x220
[<0>] loop_queue_work+0x4b7/0xa20
[<0>] kthread_worker_fn+0x97/0x1a0
[<0>] kthread+0x118/0x130
[<0>] ret_from_fork+0x22/0x30

@donald
Copy link
Contributor Author

donald commented Apr 9, 2022

/usr/local/system/perl/bin/perl -e "open F, '<', '$mountpoint/.' ; \$d=pack('L', 2) ; ioctl (F, 2147768445, \$d)", which is EXT4_IOC_SHUTDOWN _IOR and EXT4_GOING_FLAGS_NOLOGFLUSH, before the umount doesn't help much.

We can put a device mapper with a linear target before the umount and use dmsetup remove --force --deferred $dmname before the umount to fail all further I/O. While this gets fast if the device is idle and we only do a umount, it still gets slow if writeback is already in progress. Probably not because if I/O but because of writeback error handling. Additionally it leaves a lot of noise in the logs:

2540423.243623] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449816)
[2540423.259873] buffer_io_error: 32473 callbacks suppressed
[2540423.259873] Buffer I/O error on device dm-0, logical block 2449816
[2540423.271821] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449817)
[2540423.282642] Buffer I/O error on device dm-0, logical block 2449817
[2540423.289179] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449818)
[2540423.299998] Buffer I/O error on device dm-0, logical block 2449818
[2540423.306535] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449819)
[2540423.317361] Buffer I/O error on device dm-0, logical block 2449819
[2540423.323887] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449820)
[2540423.334706] Buffer I/O error on device dm-0, logical block 2449820
[2540423.341245] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449821)
[2540423.352060] Buffer I/O error on device dm-0, logical block 2449821
[2540423.358605] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449822)
[2540423.369424] Buffer I/O error on device dm-0, logical block 2449822
[2540423.375959] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449823)
[2540423.386775] Buffer I/O error on device dm-0, logical block 2449823
[2540423.393312] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449824)
[2540423.404131] Buffer I/O error on device dm-0, logical block 2449824
[2540423.410668] EXT4-fs warning (device dm-0): ext4_end_bio:349: I/O error 10 writing to inode 12 starting block 2449825)
[2540423.421484] Buffer I/O error on device dm-0, logical block 2449825
[2540423.799393] Buffer I/O error on dev dm-0, logical block 1, lost async page write
[2540423.799446] Buffer I/O error on dev dm-0, logical block 2, lost async page write
[2540423.814628] Buffer I/O error on dev dm-0, logical block 1027, lost async page write
[2540423.822499] Buffer I/O error on dev dm-0, logical block 1059, lost async page write
[2540423.830368] Buffer I/O error on dev dm-0, logical block 33795, lost async page write
[2540423.838321] Buffer I/O error on dev dm-0, logical block 33796, lost async page write
[2540423.846276] Buffer I/O error on dev dm-0, logical block 524288, lost async page write
[2540423.854319] Buffer I/O error on dev dm-0, logical block 1048576, lost async page write
[2540423.862448] Buffer I/O error on dev dm-0, logical block 1572864, lost async page write
[2540423.870573] Buffer I/O error on dev dm-0, logical block 1572868, lost async page write
[2540424.812115] EXT4-fs (dm-0): I/O error while writing superblock

I don't think we want that for every job.

@donald
Copy link
Contributor Author

donald commented Apr 9, 2022

To replace the table with a zero target seems to be fast always.

	mountpoint=$mntdir/$MXQ_JOBID
	dmname=mxq.$MXQ_JOBID
	dmdevice=/dev/mapper/$dmname
	minor=$(( 0x$(stat -c%T $dmdevice) ))
	sectors=$(cat /sys/block/dm-$minor/size)

	dmsetup reload $dmname --table "0 $sectors zero"
	dmsetup resume $dmname --noflush --nolockfs
	umount $mountpoint
	dmsetup remove $dmname

However, we still get junk in the logs:

[2541751.618340] EXT4-fs (dm-1): Delayed block allocation failed for inode 11 at logical offset 979968 with max blocks 2048 with error 117
[2541751.630585] EXT4-fs (dm-1): This should not happen!! Data will be lost
[2541752.315139] EXT4-fs (dm-1): Inode 11 (00000000d37ce04b): i_reserved_data_blocks (3073) not cleared!
[2543662.042686] EXT4-fs error (device dm-1): ext4_validate_block_bitmap:390: comm kworker/u24:4: bg 32: bad block bitmap checksum
[2543662.054248] EXT4-fs (dm-1): Delayed block allocation failed for inode 12 at logical offset 980992 with max blocks 2048 with error 74
[2543662.066404] EXT4-fs (dm-1): This should not happen!! Data will be lost

[2543663.486516] EXT4-fs (dm-1): Inode 12 (0000000052b36060): i_reserved_data_blocks (2049) not cleared!

@donald
Copy link
Contributor Author

donald commented Apr 9, 2022

Combination of EXT4_IOC_SHUTDOWN and zero target does work,

@donald
Copy link
Contributor Author

donald commented Apr 9, 2022

Sometimes we get ext4_writepages: jbd2_start: 173 pages, ino 11; err -5. And the (expected) shut down requested (2) each time. Maybe we'd need to disable that message.

[2555338.047021] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555432.766098] EXT4-fs (dm-2): shut down requested (2)
[2555439.994342] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2555459.506944] EXT4-fs (dm-1): shut down requested (2)
[2555459.651564] EXT4-fs (dm-1): ext4_writepages: jbd2_start: 3385 pages, ino 11; err -5
[2555468.777055] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2555477.306924] EXT4-fs (dm-3): shut down requested (2)
[2555482.810599] EXT4-fs (dm-3): ext4_writepages: jbd2_start: 4256 pages, ino 11; err -5
[2555483.752750] EXT4-fs (dm-0): shut down requested (2)
[2555483.952197] EXT4-fs (dm-0): ext4_writepages: jbd2_start: 4425 pages, ino 11; err -5
[2555487.801428] EXT4-fs (dm-4): mounted filesystem without journal. Opts: (null)
[2555492.201764] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555575.292305] EXT4-fs (dm-2): shut down requested (2)
[2555580.032793] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2555611.479816] EXT4-fs (dm-1): shut down requested (2)
[2555618.297560] EXT4-fs (dm-0): shut down requested (2)
[2555619.265264] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2555620.597076] EXT4-fs (dm-4): shut down requested (2)
[2555622.181716] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2555624.148530] EXT4-fs (dm-4): mounted filesystem without journal. Opts: (null)
[2555631.869127] EXT4-fs (dm-2): shut down requested (2)

Of course, we could just lazy umount to get mxqd going. But we'd still have the writeback I/O load, which we don't want.

@donald donald mentioned this issue May 11, 2022
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant