Skip to content

Commit

Permalink
Merge branch 'tls-add-support-for-kernel-driven-resync-and-nfp-RX-off…
Browse files Browse the repository at this point in the history
…load'

Jakub Kicinski says:

====================
tls: add support for kernel-driven resync and nfp RX offload

This series adds TLS RX offload for NFP and completes the offload
by providing resync strategies.  When TLS data stream looses segments
or experiences reorder NIC can no longer perform in line offload.
Resyncs provide information about placement of records in the
stream so that offload can resume.

Existing TLS resync mechanisms are not a great fit for the NFP.
In particular the TX resync is hard to implement for packet-centric
NICs.  This patchset adds an ability to perform TX resync in a way
similar to the way initial sync is done - by calling down to the
driver when new record is created after driver indicated sync had
been lost.

Similarly on the RX side, we try to wait for a gap in the stream
and send record information for the next record.  This works very
well for RPC workloads which are the primary focus at this time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jun 11, 2019
2 parents 4608805 + 9ed431c commit 758a0a4
Show file tree
Hide file tree
Showing 12 changed files with 567 additions and 74 deletions.
54 changes: 53 additions & 1 deletion Documentation/networking/tls-offload.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,11 @@ TX

Segments transmitted from an offloaded socket can get out of sync
in similar ways to the receive side-retransmissions - local drops
are possible, though network reorders are not.
are possible, though network reorders are not. There are currently
two mechanisms for dealing with out of order segments.

Crypto state rebuilding
~~~~~~~~~~~~~~~~~~~~~~~

Whenever an out of order segment is transmitted the driver provides
the device with enough information to perform cryptographic operations.
Expand All @@ -225,6 +229,35 @@ was just a retransmission. The former is simpler, and does not require
retransmission detection therefore it is the recommended method until
such time it is proven inefficient.

Next record sync
~~~~~~~~~~~~~~~~

Whenever an out of order segment is detected the driver requests
that the ``ktls`` software fallback code encrypt it. If the segment's
sequence number is lower than expected the driver assumes retransmission
and doesn't change device state. If the segment is in the future, it
may imply a local drop, the driver asks the stack to sync the device
to the next record state and falls back to software.

Resync request is indicated with:

.. code-block:: c
void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
Until resync is complete driver should not access its expected TCP
sequence number (as it will be updated from a different context).
Following helper should be used to test if resync is complete:

.. code-block:: c
bool tls_offload_tx_resync_pending(struct sock *sk)
Next time ``ktls`` pushes a record it will first send its TCP sequence number
and TLS record number to the driver. Stack will also make sure that
the new record will start on a segment boundary (like it does when
the connection is initially added).

RX
--

Expand Down Expand Up @@ -268,6 +301,9 @@ Device can only detect that segment 4 also contains a TLS header
if it knows the length of the previous record from segment 2. In this case
the device will lose synchronization with the stream.

Stream scan resynchronization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the device gets out of sync and the stream reaches TCP sequence
numbers more than a max size record past the expected TCP sequence number,
the device starts scanning for a known header pattern. For example
Expand Down Expand Up @@ -298,6 +334,22 @@ Special care has to be taken if the confirmation request is passed
asynchronously to the packet stream and record may get processed
by the kernel before the confirmation request.

Stack-driven resynchronization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The driver may also request the stack to perform resynchronization
whenever it sees the records are no longer getting decrypted.
If the connection is configured in this mode the stack automatically
schedules resynchronization after it has received two completely encrypted
records.

The stack waits for the socket to drain and informs the device about
the next expected record number and its TCP sequence number. If the
records continue to be received fully encrypted stack retries the
synchronization with an exponential back off (first after 2 encrypted
records, then after 4 records, after 8, after 16... up until every
128 records).

Error handling
==============

Expand Down
10 changes: 7 additions & 3 deletions drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
Original file line number Diff line number Diff line change
Expand Up @@ -160,13 +160,17 @@ static void mlx5e_tls_del(struct net_device *netdev,
direction == TLS_OFFLOAD_CTX_DIR_TX);
}

static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
u32 seq, u64 rcd_sn)
static void mlx5e_tls_resync(struct net_device *netdev, struct sock *sk,
u32 seq, u8 *rcd_sn_data,
enum tls_offload_ctx_dir direction)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5e_tls_offload_context_rx *rx_ctx;
u64 rcd_sn = *(u64 *)rcd_sn_data;

if (WARN_ON_ONCE(direction != TLS_OFFLOAD_CTX_DIR_RX))
return;
rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);

netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq,
Expand All @@ -178,7 +182,7 @@ static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
static const struct tlsdev_ops mlx5e_tls_ops = {
.tls_dev_add = mlx5e_tls_add,
.tls_dev_del = mlx5e_tls_del,
.tls_dev_resync_rx = mlx5e_tls_resync_rx,
.tls_dev_resync = mlx5e_tls_resync,
};

void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
Expand Down
12 changes: 9 additions & 3 deletions drivers/net/ethernet/netronome/nfp/ccm.h
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ struct nfp_ccm {
u16 tag_alloc_last;

struct sk_buff_head replies;
struct wait_queue_head wq;
wait_queue_head_t wq;
};

int nfp_ccm_init(struct nfp_ccm *ccm, struct nfp_app *app);
Expand All @@ -110,12 +110,18 @@ struct sk_buff *
nfp_ccm_communicate(struct nfp_ccm *ccm, struct sk_buff *skb,
enum nfp_ccm_type type, unsigned int reply_size);

int nfp_ccm_mbox_alloc(struct nfp_net *nn);
void nfp_ccm_mbox_free(struct nfp_net *nn);
int nfp_ccm_mbox_init(struct nfp_net *nn);
void nfp_ccm_mbox_clean(struct nfp_net *nn);
bool nfp_ccm_mbox_fits(struct nfp_net *nn, unsigned int size);
struct sk_buff *
nfp_ccm_mbox_alloc(struct nfp_net *nn, unsigned int req_size,
unsigned int reply_size, gfp_t flags);
nfp_ccm_mbox_msg_alloc(struct nfp_net *nn, unsigned int req_size,
unsigned int reply_size, gfp_t flags);
int nfp_ccm_mbox_communicate(struct nfp_net *nn, struct sk_buff *skb,
enum nfp_ccm_type type,
unsigned int reply_size,
unsigned int max_reply_size);
int nfp_ccm_mbox_post(struct nfp_net *nn, struct sk_buff *skb,
enum nfp_ccm_type type, unsigned int max_reply_size);
#endif
179 changes: 161 additions & 18 deletions drivers/net/ethernet/netronome/nfp/ccm_mbox.c
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,14 @@ enum nfp_net_mbox_cmsg_state {
* @err: error encountered during processing if any
* @max_len: max(request_len, reply_len)
* @exp_reply: expected reply length (0 means don't validate)
* @posted: the message was posted and nobody waits for the reply
*/
struct nfp_ccm_mbox_cmsg_cb {
enum nfp_net_mbox_cmsg_state state;
int err;
unsigned int max_len;
unsigned int exp_reply;
bool posted;
};

static u32 nfp_ccm_mbox_max_msg(struct nfp_net *nn)
Expand All @@ -65,6 +67,7 @@ nfp_ccm_mbox_msg_init(struct sk_buff *skb, unsigned int exp_reply, int max_len)
cb->err = 0;
cb->max_len = max_len;
cb->exp_reply = exp_reply;
cb->posted = false;
}

static int nfp_ccm_mbox_maxlen(const struct sk_buff *skb)
Expand Down Expand Up @@ -96,6 +99,20 @@ static void nfp_ccm_mbox_set_busy(struct sk_buff *skb)
cb->state = NFP_NET_MBOX_CMSG_STATE_BUSY;
}

static bool nfp_ccm_mbox_is_posted(struct sk_buff *skb)
{
struct nfp_ccm_mbox_cmsg_cb *cb = (void *)skb->cb;

return cb->posted;
}

static void nfp_ccm_mbox_mark_posted(struct sk_buff *skb)
{
struct nfp_ccm_mbox_cmsg_cb *cb = (void *)skb->cb;

cb->posted = true;
}

static bool nfp_ccm_mbox_is_first(struct nfp_net *nn, struct sk_buff *skb)
{
return skb_queue_is_first(&nn->mbox_cmsg.queue, skb);
Expand All @@ -119,6 +136,8 @@ static void nfp_ccm_mbox_mark_next_runner(struct nfp_net *nn)

cb = (void *)skb->cb;
cb->state = NFP_NET_MBOX_CMSG_STATE_NEXT;
if (cb->posted)
queue_work(nn->mbox_cmsg.workq, &nn->mbox_cmsg.runq_work);
}

static void
Expand Down Expand Up @@ -205,9 +224,7 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
while (true) {
unsigned int length, offset, type;
struct nfp_ccm_hdr hdr;
__be32 *skb_data;
u32 tlv_hdr;
int i, cnt;

tlv_hdr = readl(data);
type = FIELD_GET(NFP_NET_MBOX_TLV_TYPE, tlv_hdr);
Expand Down Expand Up @@ -278,20 +295,26 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
goto next_tlv;
}

if (length <= skb->len)
__skb_trim(skb, length);
else
skb_put(skb, length - skb->len);

/* We overcopy here slightly, but that's okay, the skb is large
* enough, and the garbage will be ignored (beyond skb->len).
*/
skb_data = (__be32 *)skb->data;
memcpy(skb_data, &hdr, 4);

cnt = DIV_ROUND_UP(length, 4);
for (i = 1 ; i < cnt; i++)
skb_data[i] = cpu_to_be32(readl(data + i * 4));
if (!cb->posted) {
__be32 *skb_data;
int i, cnt;

if (length <= skb->len)
__skb_trim(skb, length);
else
skb_put(skb, length - skb->len);

/* We overcopy here slightly, but that's okay,
* the skb is large enough, and the garbage will
* be ignored (beyond skb->len).
*/
skb_data = (__be32 *)skb->data;
memcpy(skb_data, &hdr, 4);

cnt = DIV_ROUND_UP(length, 4);
for (i = 1 ; i < cnt; i++)
skb_data[i] = cpu_to_be32(readl(data + i * 4));
}

cb->state = NFP_NET_MBOX_CMSG_STATE_REPLY_FOUND;
next_tlv:
Expand All @@ -314,6 +337,14 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
smp_wmb(); /* order the cb->err vs. cb->state */
}
cb->state = NFP_NET_MBOX_CMSG_STATE_DONE;

if (cb->posted) {
if (cb->err)
nn_dp_warn(&nn->dp,
"mailbox posted msg failed type:%u err:%d\n",
nfp_ccm_get_type(skb), cb->err);
dev_consume_skb_any(skb);
}
} while (skb != last);

nfp_ccm_mbox_mark_next_runner(nn);
Expand Down Expand Up @@ -563,9 +594,92 @@ int nfp_ccm_mbox_communicate(struct nfp_net *nn, struct sk_buff *skb,
return err;
}

static void nfp_ccm_mbox_post_runq_work(struct work_struct *work)
{
struct sk_buff *skb;
struct nfp_net *nn;

nn = container_of(work, struct nfp_net, mbox_cmsg.runq_work);

spin_lock_bh(&nn->mbox_cmsg.queue.lock);

skb = __skb_peek(&nn->mbox_cmsg.queue);
if (WARN_ON(!skb || !nfp_ccm_mbox_is_posted(skb) ||
!nfp_ccm_mbox_should_run(nn, skb))) {
spin_unlock_bh(&nn->mbox_cmsg.queue.lock);
return;
}

nfp_ccm_mbox_run_queue_unlock(nn);
}

static void nfp_ccm_mbox_post_wait_work(struct work_struct *work)
{
struct sk_buff *skb;
struct nfp_net *nn;
int err;

nn = container_of(work, struct nfp_net, mbox_cmsg.wait_work);

skb = skb_peek(&nn->mbox_cmsg.queue);
if (WARN_ON(!skb || !nfp_ccm_mbox_is_posted(skb)))
/* Should never happen so it's unclear what to do here.. */
goto exit_unlock_wake;

err = nfp_net_mbox_reconfig_wait_posted(nn);
if (!err)
nfp_ccm_mbox_copy_out(nn, skb);
else
nfp_ccm_mbox_mark_all_err(nn, skb, -EIO);
exit_unlock_wake:
nn_ctrl_bar_unlock(nn);
wake_up_all(&nn->mbox_cmsg.wq);
}

int nfp_ccm_mbox_post(struct nfp_net *nn, struct sk_buff *skb,
enum nfp_ccm_type type, unsigned int max_reply_size)
{
int err;

err = nfp_ccm_mbox_msg_prepare(nn, skb, type, 0, max_reply_size,
GFP_ATOMIC);
if (err)
goto err_free_skb;

nfp_ccm_mbox_mark_posted(skb);

spin_lock_bh(&nn->mbox_cmsg.queue.lock);

err = nfp_ccm_mbox_msg_enqueue(nn, skb, type);
if (err)
goto err_unlock;

if (nfp_ccm_mbox_is_first(nn, skb)) {
if (nn_ctrl_bar_trylock(nn)) {
nfp_ccm_mbox_copy_in(nn, skb);
nfp_net_mbox_reconfig_post(nn,
NFP_NET_CFG_MBOX_CMD_TLV_CMSG);
queue_work(nn->mbox_cmsg.workq,
&nn->mbox_cmsg.wait_work);
} else {
nfp_ccm_mbox_mark_next_runner(nn);
}
}

spin_unlock_bh(&nn->mbox_cmsg.queue.lock);

return 0;

err_unlock:
spin_unlock_bh(&nn->mbox_cmsg.queue.lock);
err_free_skb:
dev_kfree_skb_any(skb);
return err;
}

struct sk_buff *
nfp_ccm_mbox_alloc(struct nfp_net *nn, unsigned int req_size,
unsigned int reply_size, gfp_t flags)
nfp_ccm_mbox_msg_alloc(struct nfp_net *nn, unsigned int req_size,
unsigned int reply_size, gfp_t flags)
{
unsigned int max_size;
struct sk_buff *skb;
Expand All @@ -589,3 +703,32 @@ bool nfp_ccm_mbox_fits(struct nfp_net *nn, unsigned int size)
{
return nfp_ccm_mbox_max_msg(nn) >= size;
}

int nfp_ccm_mbox_init(struct nfp_net *nn)
{
return 0;
}

void nfp_ccm_mbox_clean(struct nfp_net *nn)
{
drain_workqueue(nn->mbox_cmsg.workq);
}

int nfp_ccm_mbox_alloc(struct nfp_net *nn)
{
skb_queue_head_init(&nn->mbox_cmsg.queue);
init_waitqueue_head(&nn->mbox_cmsg.wq);
INIT_WORK(&nn->mbox_cmsg.wait_work, nfp_ccm_mbox_post_wait_work);
INIT_WORK(&nn->mbox_cmsg.runq_work, nfp_ccm_mbox_post_runq_work);

nn->mbox_cmsg.workq = alloc_workqueue("nfp-ccm-mbox", WQ_UNBOUND, 0);
if (!nn->mbox_cmsg.workq)
return -ENOMEM;
return 0;
}

void nfp_ccm_mbox_free(struct nfp_net *nn)
{
destroy_workqueue(nn->mbox_cmsg.workq);
WARN_ON(!skb_queue_empty(&nn->mbox_cmsg.queue));
}
Loading

0 comments on commit 758a0a4

Please sign in to comment.