We've seen this: - mxqd starts job with reaper pid 1042 - someone reboots the system. Pid 1042 is killed with -9 during reboot, so no spool file created - system boots and starts a nfsd as pid 1042 - mxqd is started, doesn't find a spool file for the job finds pid 1042 alive, so it assumes the job is still running. - job times out and mxqd tries (unsuccessfully) to kill the nfsd. mxqd could notice the reboot and declare all pids as dead. But similar events could happen without reboot if pids wrap: - mxqd is stopped by admin - user job is killed by admin with kill -9 - pids wrap, some new process is started with the jobs pid - mxqd is started by admin So we might consider to further validate the processes.