Skip to content

Commit

Permalink
Merge tag 'rxrpc-fixes-20230107' of git://git.kernel.org/pub/scm/linu…
Browse files Browse the repository at this point in the history
…x/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fix race between call connection, data transmit and call disconnect

Here are patches to fix an oops[1] caused by a race between call
connection, initial packet transmission and call disconnection which
results in something like:

        kernel BUG at net/rxrpc/peer_object.c:413!

when the syzbot test is run.  The problem is that the connection procedure
is effectively split across two threads and can get expanded by taking an
interrupt, thereby adding the call to the peer error distribution list
*after* it has been disconnected (say by the rxrpc socket shutting down).

The easiest solution is to look at the fourth set of I/O thread
conversion/SACK table expansion patches that didn't get applied[2] and take
from it those patches that move call connection and disconnection into the
I/O thread.  Moving these things into the I/O thread means that the
sequencing is managed by all being done in the same thread - and the race
can no longer happen.

This is preferable to introducing an extra lock as adding an extra lock
would make the I/O thread have to wait for the app thread in yet another
place.

The changes can be considered as a number of logical parts:

 (1) Move all of the call state changes into the I/O thread.

 (2) Make client connection ID space per-local endpoint so that the I/O
     thread doesn't need locks to access it.

 (3) Move actual abort generation into the I/O thread and clean it up.  If
     sendmsg or recvmsg want to cause an abort, they have to delegate it.

 (4) Offload the setting up of the security context on a connection to the
     thread of one of the apps that's starting a call.  We don't want to be
     doing any sort of crypto in the I/O thread.

 (5) Connect calls (ie. assign them to channel slots on connections) in the
     I/O thread.  Calls are set up by sendmsg/kafs and passed to the I/O
     thread to connect.  Connections are allocated in the I/O thread after
     this.

 (6) Disconnect calls in the I/O thread.

I've also added a patch for an unrelated bug that cropped up during
testing, whereby a race can occur between an incoming call and socket
shutdown.

Note that whilst this fixes the original syzbot bug, another bug may get
triggered if this one is fixed:

        INFO: rcu detected stall in corrupted
        rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T
        rcu: blocking rcu_node structures (internal RCU debug):

It doesn't look this should be anything to do with rxrpc, though, as I've
tested an additional patch[3] that removes practically all the RCU usage
from rxrpc and it still occurs.  It seems likely that it is being caused by
something in the tunnelling setup that the syzbot test does, but there's
not enough info to go on.  It also seems unlikely to be anything to do with
the afs driver as the test doesn't use that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jan 7, 2023
2 parents b4e9b87 + 42f229c commit 571f3dd
Show file tree
Hide file tree
Showing 29 changed files with 1,592 additions and 1,760 deletions.
4 changes: 2 additions & 2 deletions Documentation/networking/rxrpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -880,8 +880,8 @@ The kernel interface functions are as follows:

notify_end_rx can be NULL or it can be used to specify a function to be
called when the call changes state to end the Tx phase. This function is
called with the call-state spinlock held to prevent any reply or final ACK
from being delivered first.
called with a spinlock held to prevent the last DATA packet from being
transmitted until the function returns.

(#) Receive data from a call::

Expand Down
6 changes: 4 additions & 2 deletions fs/afs/cmservice.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
#include "internal.h"
#include "afs_cm.h"
#include "protocol_yfs.h"
#define RXRPC_TRACE_ONLY_DEFINE_ENUMS
#include <trace/events/rxrpc.h>

static int afs_deliver_cb_init_call_back_state(struct afs_call *);
static int afs_deliver_cb_init_call_back_state3(struct afs_call *);
Expand Down Expand Up @@ -191,7 +193,7 @@ static void afs_cm_destructor(struct afs_call *call)
* Abort a service call from within an action function.
*/
static void afs_abort_service_call(struct afs_call *call, u32 abort_code, int error,
const char *why)
enum rxrpc_abort_reason why)
{
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, error, why);
Expand Down Expand Up @@ -469,7 +471,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *work)
if (memcmp(r, &call->net->uuid, sizeof(call->net->uuid)) == 0)
afs_send_empty_reply(call);
else
afs_abort_service_call(call, 1, 1, "K-1");
afs_abort_service_call(call, 1, 1, afs_abort_probeuuid_negative);

afs_put_call(call);
_leave("");
Expand Down
24 changes: 17 additions & 7 deletions fs/afs/rxrpc.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
#include "internal.h"
#include "afs_cm.h"
#include "protocol_yfs.h"
#define RXRPC_TRACE_ONLY_DEFINE_ENUMS
#include <trace/events/rxrpc.h>

struct workqueue_struct *afs_async_calls;

Expand Down Expand Up @@ -397,7 +399,8 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
error_do_abort:
if (ret != -ECONNABORTED) {
rxrpc_kernel_abort_call(call->net->socket, rxcall,
RX_USER_ABORT, ret, "KSD");
RX_USER_ABORT, ret,
afs_abort_send_data_error);
} else {
len = 0;
iov_iter_kvec(&msg.msg_iter, ITER_DEST, NULL, 0, 0);
Expand Down Expand Up @@ -527,7 +530,8 @@ static void afs_deliver_to_call(struct afs_call *call)
case -ENOTSUPP:
abort_code = RXGEN_OPCODE;
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, ret, "KIV");
abort_code, ret,
afs_abort_op_not_supported);
goto local_abort;
case -EIO:
pr_err("kAFS: Call %u in bad state %u\n",
Expand All @@ -542,12 +546,14 @@ static void afs_deliver_to_call(struct afs_call *call)
if (state != AFS_CALL_CL_AWAIT_REPLY)
abort_code = RXGEN_SS_UNMARSHAL;
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, ret, "KUM");
abort_code, ret,
afs_abort_unmarshal_error);
goto local_abort;
default:
abort_code = RX_CALL_DEAD;
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, ret, "KER");
abort_code, ret,
afs_abort_general_error);
goto local_abort;
}
}
Expand Down Expand Up @@ -619,7 +625,8 @@ long afs_wait_for_call_to_complete(struct afs_call *call,
/* Kill off the call if it's still live. */
_debug("call interrupted");
if (rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
RX_USER_ABORT, -EINTR, "KWI"))
RX_USER_ABORT, -EINTR,
afs_abort_interrupted))
afs_set_call_complete(call, -EINTR, 0);
}
}
Expand Down Expand Up @@ -836,7 +843,8 @@ void afs_send_empty_reply(struct afs_call *call)
case -ENOMEM:
_debug("oom");
rxrpc_kernel_abort_call(net->socket, call->rxcall,
RXGEN_SS_MARSHAL, -ENOMEM, "KOO");
RXGEN_SS_MARSHAL, -ENOMEM,
afs_abort_oom);
fallthrough;
default:
_leave(" [error]");
Expand Down Expand Up @@ -878,7 +886,8 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
if (n == -ENOMEM) {
_debug("oom");
rxrpc_kernel_abort_call(net->socket, call->rxcall,
RXGEN_SS_MARSHAL, -ENOMEM, "KOO");
RXGEN_SS_MARSHAL, -ENOMEM,
afs_abort_oom);
}
_leave(" [error]");
}
Expand All @@ -900,6 +909,7 @@ int afs_extract_data(struct afs_call *call, bool want_more)
ret = rxrpc_kernel_recv_data(net->socket, call->rxcall, iter,
&call->iov_len, want_more, &remote_abort,
&call->service_id);
trace_afs_receive_data(call, call->iter, want_more, ret);
if (ret == 0 || ret == -EAGAIN)
return ret;

Expand Down
3 changes: 2 additions & 1 deletion include/net/af_rxrpc.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ struct key;
struct sock;
struct socket;
struct rxrpc_call;
enum rxrpc_abort_reason;

enum rxrpc_interruptibility {
RXRPC_INTERRUPTIBLE, /* Call is interruptible */
Expand Down Expand Up @@ -55,7 +56,7 @@ int rxrpc_kernel_send_data(struct socket *, struct rxrpc_call *,
int rxrpc_kernel_recv_data(struct socket *, struct rxrpc_call *,
struct iov_iter *, size_t *, bool, u32 *, u16 *);
bool rxrpc_kernel_abort_call(struct socket *, struct rxrpc_call *,
u32, int, const char *);
u32, int, enum rxrpc_abort_reason);
void rxrpc_kernel_end_call(struct socket *, struct rxrpc_call *);
void rxrpc_kernel_get_peer(struct socket *, struct rxrpc_call *,
struct sockaddr_rxrpc *);
Expand Down
Loading

0 comments on commit 571f3dd

Please sign in to comment.