Skip to content

Commit

Permalink
Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/…
Browse files Browse the repository at this point in the history
…kernel/git/brauner/linux

Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
  • Loading branch information
Linus Torvalds committed Jul 11, 2019
2 parents 29cd581 + 172bb24 commit 5450e8a
Show file tree
Hide file tree
Showing 29 changed files with 571 additions and 44 deletions.
1 change: 1 addition & 0 deletions arch/alpha/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -473,3 +473,4 @@
541 common fsconfig sys_fsconfig
542 common fsmount sys_fsmount
543 common fspick sys_fspick
544 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/arm/tools/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -447,3 +447,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/unistd.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

#define __NR_compat_syscalls 434
#define __NR_compat_syscalls 435
#endif

#define __ARCH_WANT_SYS_CLONE
Expand Down
2 changes: 2 additions & 0 deletions arch/arm64/include/asm/unistd32.h
Original file line number Diff line number Diff line change
Expand Up @@ -875,6 +875,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
#define __NR_pidfd_open 434
__SYSCALL(__NR_pidfd_open, sys_pidfd_open)

/*
* Please add new compat syscalls above this comment and update
Expand Down
1 change: 1 addition & 0 deletions arch/ia64/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -354,3 +354,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/m68k/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -433,3 +433,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/microblaze/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -439,3 +439,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_n32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -372,3 +372,4 @@
431 n32 fsconfig sys_fsconfig
432 n32 fsmount sys_fsmount
433 n32 fspick sys_fspick
434 n32 pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_n64.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -348,3 +348,4 @@
431 n64 fsconfig sys_fsconfig
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_o32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -421,3 +421,4 @@
431 o32 fsconfig sys_fsconfig
432 o32 fsmount sys_fsmount
433 o32 fspick sys_fspick
434 o32 pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/parisc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -430,3 +430,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/powerpc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -515,3 +515,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/s390/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig sys_fsconfig
432 common fsmount sys_fsmount sys_fsmount
433 common fspick sys_fspick sys_fspick
434 common pidfd_open sys_pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/sh/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/sparc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -479,3 +479,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
1 change: 1 addition & 0 deletions arch/x86/entry/syscalls/syscall_32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -438,3 +438,4 @@
431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
432 i386 fsmount sys_fsmount __ia32_sys_fsmount
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
1 change: 1 addition & 0 deletions arch/x86/entry/syscalls/syscall_64.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,7 @@
431 common fsconfig __x64_sys_fsconfig
432 common fsmount __x64_sys_fsmount
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open

#
# x32-specific system call numbers start at 512 to avoid cache impact
Expand Down
1 change: 1 addition & 0 deletions arch/xtensa/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -404,3 +404,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
3 changes: 3 additions & 0 deletions include/linux/pid.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#define _LINUX_PID_H

#include <linux/rculist.h>
#include <linux/wait.h>

enum pid_type
{
Expand Down Expand Up @@ -60,6 +61,8 @@ struct pid
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
/* wait queue for pidfd notifications */
wait_queue_head_t wait_pidfd;
struct rcu_head rcu;
struct upid numbers[1];
};
Expand Down
1 change: 1 addition & 0 deletions include/linux/syscalls.h
Original file line number Diff line number Diff line change
Expand Up @@ -927,6 +927,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
struct old_timex32 __user *tx);
asmlinkage long sys_syncfs(int fd);
asmlinkage long sys_setns(int fd, int nstype);
asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
unsigned int vlen, unsigned flags);
asmlinkage long sys_process_vm_readv(pid_t pid,
Expand Down
4 changes: 3 additions & 1 deletion include/uapi/asm-generic/unistd.h
Original file line number Diff line number Diff line change
Expand Up @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
#define __NR_pidfd_open 434
__SYSCALL(__NR_pidfd_open, sys_pidfd_open)

#undef __NR_syscalls
#define __NR_syscalls 434
#define __NR_syscalls 435

/*
* 32 bit systems traditionally used different
Expand Down
26 changes: 26 additions & 0 deletions kernel/fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -1711,8 +1711,34 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
}
#endif

/*
* Poll support for process exit notification.
*/
static unsigned int pidfd_poll(struct file *file, struct poll_table_struct *pts)
{
struct task_struct *task;
struct pid *pid = file->private_data;
int poll_flags = 0;

poll_wait(file, &pid->wait_pidfd, pts);

rcu_read_lock();
task = pid_task(pid, PIDTYPE_PID);
/*
* Inform pollers only when the whole thread group exits.
* If the thread group leader exits before all other threads in the
* group, then poll(2) should block, similar to the wait(2) family.
*/
if (!task || (task->exit_state && thread_group_empty(task)))
poll_flags = POLLIN | POLLRDNORM;
rcu_read_unlock();

return poll_flags;
}

const struct file_operations pidfd_fops = {
.release = pidfd_release,
.poll = pidfd_poll,
#ifdef CONFIG_PROC_FS
.show_fdinfo = pidfd_show_fdinfo,
#endif
Expand Down
71 changes: 71 additions & 0 deletions kernel/pid.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@
#include <linux/syscalls.h>
#include <linux/proc_ns.h>
#include <linux/proc_fs.h>
#include <linux/anon_inodes.h>
#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>

Expand Down Expand Up @@ -214,6 +216,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
for (type = 0; type < PIDTYPE_MAX; ++type)
INIT_HLIST_HEAD(&pid->tasks[type]);

init_waitqueue_head(&pid->wait_pidfd);

upid = pid->numbers + ns->level;
spin_lock_irq(&pidmap_lock);
if (!(ns->pid_allocated & PIDNS_ADDING))
Expand Down Expand Up @@ -451,6 +455,73 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return idr_get_next(&ns->idr, &nr);
}

/**
* pidfd_create() - Create a new pid file descriptor.
*
* @pid: struct pid that the pidfd will reference
*
* This creates a new pid file descriptor with the O_CLOEXEC flag set.
*
* Note, that this function can only be called after the fd table has
* been unshared to avoid leaking the pidfd to the new process.
*
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
static int pidfd_create(struct pid *pid)
{
int fd;

fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
O_RDWR | O_CLOEXEC);
if (fd < 0)
put_pid(pid);

return fd;
}

/**
* pidfd_open() - Open new pid file descriptor.
*
* @pid: pid for which to retrieve a pidfd
* @flags: flags to pass
*
* This creates a new pid file descriptor with the O_CLOEXEC flag set for
* the process identified by @pid. Currently, the process identified by
* @pid must be a thread-group leader. This restriction currently exists
* for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
* be used with CLONE_THREAD) and pidfd polling (only supports thread group
* leaders).
*
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
{
int fd, ret;
struct pid *p;

if (flags)
return -EINVAL;

if (pid <= 0)
return -EINVAL;

p = find_get_pid(pid);
if (!p)
return -ESRCH;

ret = 0;
rcu_read_lock();
if (!pid_task(p, PIDTYPE_TGID))
ret = -EINVAL;
rcu_read_unlock();

fd = ret ?: pidfd_create(p);
put_pid(p);
return fd;
}

void __init pid_idr_init(void)
{
/* Verify no one has done anything silly: */
Expand Down
11 changes: 11 additions & 0 deletions kernel/signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -1881,6 +1881,14 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
return ret;
}

static void do_notify_pidfd(struct task_struct *task)
{
struct pid *pid;

pid = task_pid(task);
wake_up_all(&pid->wait_pidfd);
}

/*
* Let a parent know about the death of a child.
* For a stopped/continued status change, use do_notify_parent_cldstop instead.
Expand All @@ -1904,6 +1912,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));

/* Wake up all pidfd waiters */
do_notify_pidfd(tsk);

if (sig != SIGCHLD) {
/*
* This is only possible if parent == real_parent.
Expand Down
1 change: 1 addition & 0 deletions tools/testing/selftests/pidfd/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
pidfd_open_test
pidfd_test
4 changes: 2 additions & 2 deletions tools/testing/selftests/pidfd/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -g -I../../../../usr/include/
CFLAGS += -g -I../../../../usr/include/ -lpthread

TEST_GEN_PROGS := pidfd_test
TEST_GEN_PROGS := pidfd_test pidfd_open_test

include ../lib.mk

57 changes: 57 additions & 0 deletions tools/testing/selftests/pidfd/pidfd.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/* SPDX-License-Identifier: GPL-2.0 */

#ifndef __PIDFD_H
#define __PIDFD_H

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <syscall.h>
#include <sys/mount.h>

#include "../kselftest.h"

/*
* The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
* That means, when it wraps around any pid < 300 will be skipped.
* So we need to use a pid > 300 in order to test recycling.
*/
#define PID_RECYCLE 1000

/*
* Define a few custom error codes for the child process to clearly indicate
* what is happening. This way we can tell the difference between a system
* error, a test error, etc.
*/
#define PIDFD_PASS 0
#define PIDFD_FAIL 1
#define PIDFD_ERROR 2
#define PIDFD_SKIP 3
#define PIDFD_XFAIL 4

int wait_for_pid(pid_t pid)
{
int status, ret;

again:
ret = waitpid(pid, &status, 0);
if (ret == -1) {
if (errno == EINTR)
goto again;

return -1;
}

if (!WIFEXITED(status))
return -1;

return WEXITSTATUS(status);
}


#endif /* __PIDFD_H */
Loading

0 comments on commit 5450e8a

Please sign in to comment.