-
Notifications
You must be signed in to change notification settings - Fork 3
umount() of tempdir sometimes needs too long #121
Comments
Killed a group with the jobs running on the same node. The jobs copied about 30G to TMPDIR. The umount of each job took several minutes, so mxqd was frozen for over 20 minutes. This is a huge problem. http://afk.molgen.mpg.de/mxq/mxq/group/496259 Stack of a "loopN" thread:
|
We can put a device mapper with a linear target before the umount and use
I don't think we want that for every job. |
To replace the table with a zero target seems to be fast always.
However, we still get junk in the logs:
|
Combination of EXT4_IOC_SHUTDOWN and zero target does work, |
Sometimes we get
Of course, we could just lazy umount to get mxqd going. But we'd still have the writeback I/O load, which we don't want. |
I've seen mxqd hanging for tens of seconds on the umount() of a tempdir after a job finished. Probably writeback of dirty data. In the initial experiments, this wasn't a problem. Maybe something changed in the kernel?
Maybe we can use dm and replace dm-linear with dm-zero before the umount?
The text was updated successfully, but these errors were encountered: