Skip to content

Commit

Permalink
gpu-setup: Don't unlock to early during release
Browse files Browse the repository at this point in the history
Currently, the gpu lock file `pid` is released (removed) to early, so
that there is a small race condition with a new GPU allocation:

```
   MXQ           job1                        job2
* fork job1
                 * other initialization
                 * reserve gpu:
                 * * find slot without pid
                 * * change access to UID
                 * run user program
                 * exit
* fork job2
                                             *  other initialization
* cleanup job 1:
* * rm .../pid
                                             * reserve gpu:
                                             * * find slot without pid
                                             * * change access to UID
* * change access to root
```

On release, keep the `pid` file until after the access mode has been
changed back to root.
  • Loading branch information
donald committed Dec 30, 2023
1 parent aa03e45 commit 009304f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion helper/gpu-setup
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,6 @@ job_release() {
test_pid="$(cat $d/pid 2>/dev/null)"
if [ "$pid" = "$test_pid" ]; then
echo "XXX $$ job_release $pid: found my pid in $d, releasing" >&2
rm $d/pid
for f in $(cat $d/access-files); do
case $f in
/dev/nvidia-caps/nvidia-cap*)
Expand All @@ -287,6 +286,7 @@ job_release() {
;;
esac
done
rm $d/pid
exit 0
fi
fi
Expand Down

0 comments on commit 009304f

Please sign in to comment.