Merge branch 'tls-add-support-for-kernel-driven-resync-and-nfp-RX-off…

…load' Jakub Kicinski says: ==================== tls: add support for kernel-driven resync and nfp RX offload This series adds TLS RX offload for NFP and completes the offload by providing resync strategies. When TLS data stream looses segments or experiences reorder NIC can no longer perform in line offload. Resyncs provide information about placement of records in the stream so that offload can resume. Existing TLS resync mechanisms are not a great fit for the NFP. In particular the TX resync is hard to implement for packet-centric NICs. This patchset adds an ability to perform TX resync in a way similar to the way initial sync is done - by calling down to the driver when new record is created after driver indicated sync had been lost. Similarly on the RX side, we try to wait for a gap in the stream and send record information for the next record. This works very well for RPC workloads which are the primary focus at this time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
mariux64 · Jun 11, 2019 · 758a0a4 · 758a0a4
2 parents 4608805 + 9ed431c
commit 758a0a4
Show file tree

Hide file tree

Showing 12 changed files with 567 additions and 74 deletions.
diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
@@ -206,7 +206,11 @@ TX
 
 Segments transmitted from an offloaded socket can get out of sync
 in similar ways to the receive side-retransmissions - local drops
-are possible, though network reorders are not.
+are possible, though network reorders are not. There are currently
+two mechanisms for dealing with out of order segments.
+
+Crypto state rebuilding
+~~~~~~~~~~~~~~~~~~~~~~~
 
 Whenever an out of order segment is transmitted the driver provides
 the device with enough information to perform cryptographic operations.
@@ -225,6 +229,35 @@ was just a retransmission. The former is simpler, and does not require
 retransmission detection therefore it is the recommended method until
 such time it is proven inefficient.
 
+Next record sync
+~~~~~~~~~~~~~~~~
+
+Whenever an out of order segment is detected the driver requests
+that the ``ktls`` software fallback code encrypt it. If the segment's
+sequence number is lower than expected the driver assumes retransmission
+and doesn't change device state. If the segment is in the future, it
+may imply a local drop, the driver asks the stack to sync the device
+to the next record state and falls back to software.
+
+Resync request is indicated with:
+
+.. code-block:: c
+
+  void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
+
+Until resync is complete driver should not access its expected TCP
+sequence number (as it will be updated from a different context).
+Following helper should be used to test if resync is complete:
+
+.. code-block:: c
+
+  bool tls_offload_tx_resync_pending(struct sock *sk)
+
+Next time ``ktls`` pushes a record it will first send its TCP sequence number
+and TLS record number to the driver. Stack will also make sure that
+the new record will start on a segment boundary (like it does when
+the connection is initially added).
+
 RX
 --
 
@@ -268,6 +301,9 @@ Device can only detect that segment 4 also contains a TLS header
 if it knows the length of the previous record from segment 2. In this case
 the device will lose synchronization with the stream.
 
+Stream scan resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 When the device gets out of sync and the stream reaches TCP sequence
 numbers more than a max size record past the expected TCP sequence number,
 the device starts scanning for a known header pattern. For example
@@ -298,6 +334,22 @@ Special care has to be taken if the confirmation request is passed
 asynchronously to the packet stream and record may get processed
 by the kernel before the confirmation request.
 
+Stack-driven resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The driver may also request the stack to perform resynchronization
+whenever it sees the records are no longer getting decrypted.
+If the connection is configured in this mode the stack automatically
+schedules resynchronization after it has received two completely encrypted
+records.
+
+The stack waits for the socket to drain and informs the device about
+the next expected record number and its TCP sequence number. If the
+records continue to be received fully encrypted stack retries the
+synchronization with an exponential back off (first after 2 encrypted
+records, then after 4 records, after 8, after 16... up until every
+128 records).
+
 Error handling
 ==============
 

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -160,13 +160,17 @@ static void mlx5e_tls_del(struct net_device *netdev,
 				direction == TLS_OFFLOAD_CTX_DIR_TX);
 }
 
-static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
-				u32 seq, u64 rcd_sn)
+static void mlx5e_tls_resync(struct net_device *netdev, struct sock *sk,
+			     u32 seq, u8 *rcd_sn_data,
+			     enum tls_offload_ctx_dir direction)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5e_tls_offload_context_rx *rx_ctx;
+	u64 rcd_sn = *(u64 *)rcd_sn_data;
 
+	if (WARN_ON_ONCE(direction != TLS_OFFLOAD_CTX_DIR_RX))
+		return;
 	rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
 
 	netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq,
@@ -178,7 +182,7 @@ static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
 static const struct tlsdev_ops mlx5e_tls_ops = {
 	.tls_dev_add = mlx5e_tls_add,
 	.tls_dev_del = mlx5e_tls_del,
-	.tls_dev_resync_rx = mlx5e_tls_resync_rx,
+	.tls_dev_resync = mlx5e_tls_resync,
 };
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)

diff --git a/drivers/net/ethernet/netronome/nfp/ccm.h b/drivers/net/ethernet/netronome/nfp/ccm.h
@@ -100,7 +100,7 @@ struct nfp_ccm {
 	u16 tag_alloc_last;
 
 	struct sk_buff_head replies;
-	struct wait_queue_head wq;
+	wait_queue_head_t wq;
 };
 
 int nfp_ccm_init(struct nfp_ccm *ccm, struct nfp_app *app);
@@ -110,12 +110,18 @@ struct sk_buff *
 nfp_ccm_communicate(struct nfp_ccm *ccm, struct sk_buff *skb,
 		    enum nfp_ccm_type type, unsigned int reply_size);
 
+int nfp_ccm_mbox_alloc(struct nfp_net *nn);
+void nfp_ccm_mbox_free(struct nfp_net *nn);
+int nfp_ccm_mbox_init(struct nfp_net *nn);
+void nfp_ccm_mbox_clean(struct nfp_net *nn);
 bool nfp_ccm_mbox_fits(struct nfp_net *nn, unsigned int size);
 struct sk_buff *
-nfp_ccm_mbox_alloc(struct nfp_net *nn, unsigned int req_size,
-		   unsigned int reply_size, gfp_t flags);
+nfp_ccm_mbox_msg_alloc(struct nfp_net *nn, unsigned int req_size,
+		       unsigned int reply_size, gfp_t flags);
 int nfp_ccm_mbox_communicate(struct nfp_net *nn, struct sk_buff *skb,
 			     enum nfp_ccm_type type,
 			     unsigned int reply_size,
 			     unsigned int max_reply_size);
+int nfp_ccm_mbox_post(struct nfp_net *nn, struct sk_buff *skb,
+		      enum nfp_ccm_type type, unsigned int max_reply_size);
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/ccm_mbox.c b/drivers/net/ethernet/netronome/nfp/ccm_mbox.c
@@ -41,12 +41,14 @@ enum nfp_net_mbox_cmsg_state {
  * @err:	error encountered during processing if any
  * @max_len:	max(request_len, reply_len)
  * @exp_reply:	expected reply length (0 means don't validate)
+ * @posted:	the message was posted and nobody waits for the reply
  */
 struct nfp_ccm_mbox_cmsg_cb {
 	enum nfp_net_mbox_cmsg_state state;
 	int err;
 	unsigned int max_len;
 	unsigned int exp_reply;
+	bool posted;
 };
 
 static u32 nfp_ccm_mbox_max_msg(struct nfp_net *nn)
@@ -65,6 +67,7 @@ nfp_ccm_mbox_msg_init(struct sk_buff *skb, unsigned int exp_reply, int max_len)
 	cb->err = 0;
 	cb->max_len = max_len;
 	cb->exp_reply = exp_reply;
+	cb->posted = false;
 }
 
 static int nfp_ccm_mbox_maxlen(const struct sk_buff *skb)
@@ -96,6 +99,20 @@ static void nfp_ccm_mbox_set_busy(struct sk_buff *skb)
 	cb->state = NFP_NET_MBOX_CMSG_STATE_BUSY;
 }
 
+static bool nfp_ccm_mbox_is_posted(struct sk_buff *skb)
+{
+	struct nfp_ccm_mbox_cmsg_cb *cb = (void *)skb->cb;
+
+	return cb->posted;
+}
+
+static void nfp_ccm_mbox_mark_posted(struct sk_buff *skb)
+{
+	struct nfp_ccm_mbox_cmsg_cb *cb = (void *)skb->cb;
+
+	cb->posted = true;
+}
+
 static bool nfp_ccm_mbox_is_first(struct nfp_net *nn, struct sk_buff *skb)
 {
 	return skb_queue_is_first(&nn->mbox_cmsg.queue, skb);
@@ -119,6 +136,8 @@ static void nfp_ccm_mbox_mark_next_runner(struct nfp_net *nn)
 
 	cb = (void *)skb->cb;
 	cb->state = NFP_NET_MBOX_CMSG_STATE_NEXT;
+	if (cb->posted)
+		queue_work(nn->mbox_cmsg.workq, &nn->mbox_cmsg.runq_work);
 }
 
 static void
@@ -205,9 +224,7 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
 	while (true) {
 		unsigned int length, offset, type;
 		struct nfp_ccm_hdr hdr;
-		__be32 *skb_data;
 		u32 tlv_hdr;
-		int i, cnt;
 
 		tlv_hdr = readl(data);
 		type = FIELD_GET(NFP_NET_MBOX_TLV_TYPE, tlv_hdr);
@@ -278,20 +295,26 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
 			goto next_tlv;
 		}
 
-		if (length <= skb->len)
-			__skb_trim(skb, length);
-		else
-			skb_put(skb, length - skb->len);
-
-		/* We overcopy here slightly, but that's okay, the skb is large
-		 * enough, and the garbage will be ignored (beyond skb->len).
-		 */
-		skb_data = (__be32 *)skb->data;
-		memcpy(skb_data, &hdr, 4);
-
-		cnt = DIV_ROUND_UP(length, 4);
-		for (i = 1 ; i < cnt; i++)
-			skb_data[i] = cpu_to_be32(readl(data + i * 4));
+		if (!cb->posted) {
+			__be32 *skb_data;
+			int i, cnt;
+
+			if (length <= skb->len)
+				__skb_trim(skb, length);
+			else
+				skb_put(skb, length - skb->len);
+
+			/* We overcopy here slightly, but that's okay,
+			 * the skb is large enough, and the garbage will
+			 * be ignored (beyond skb->len).
+			 */
+			skb_data = (__be32 *)skb->data;
+			memcpy(skb_data, &hdr, 4);
+
+			cnt = DIV_ROUND_UP(length, 4);
+			for (i = 1 ; i < cnt; i++)
+				skb_data[i] = cpu_to_be32(readl(data + i * 4));
+		}
 
 		cb->state = NFP_NET_MBOX_CMSG_STATE_REPLY_FOUND;
 next_tlv:
@@ -314,6 +337,14 @@ static void nfp_ccm_mbox_copy_out(struct nfp_net *nn, struct sk_buff *last)
 			smp_wmb(); /* order the cb->err vs. cb->state */
 		}
 		cb->state = NFP_NET_MBOX_CMSG_STATE_DONE;
+
+		if (cb->posted) {
+			if (cb->err)
+				nn_dp_warn(&nn->dp,
+					   "mailbox posted msg failed type:%u err:%d\n",
+					   nfp_ccm_get_type(skb), cb->err);
+			dev_consume_skb_any(skb);
+		}
 	} while (skb != last);
 
 	nfp_ccm_mbox_mark_next_runner(nn);
@@ -563,9 +594,92 @@ int nfp_ccm_mbox_communicate(struct nfp_net *nn, struct sk_buff *skb,
 	return err;
 }
 
+static void nfp_ccm_mbox_post_runq_work(struct work_struct *work)
+{
+	struct sk_buff *skb;
+	struct nfp_net *nn;
+
+	nn = container_of(work, struct nfp_net, mbox_cmsg.runq_work);
+
+	spin_lock_bh(&nn->mbox_cmsg.queue.lock);
+
+	skb = __skb_peek(&nn->mbox_cmsg.queue);
+	if (WARN_ON(!skb || !nfp_ccm_mbox_is_posted(skb) ||
+		    !nfp_ccm_mbox_should_run(nn, skb))) {
+		spin_unlock_bh(&nn->mbox_cmsg.queue.lock);
+		return;
+	}
+
+	nfp_ccm_mbox_run_queue_unlock(nn);
+}
+
+static void nfp_ccm_mbox_post_wait_work(struct work_struct *work)
+{
+	struct sk_buff *skb;
+	struct nfp_net *nn;
+	int err;
+
+	nn = container_of(work, struct nfp_net, mbox_cmsg.wait_work);
+
+	skb = skb_peek(&nn->mbox_cmsg.queue);
+	if (WARN_ON(!skb || !nfp_ccm_mbox_is_posted(skb)))
+		/* Should never happen so it's unclear what to do here.. */
+		goto exit_unlock_wake;
+
+	err = nfp_net_mbox_reconfig_wait_posted(nn);
+	if (!err)
+		nfp_ccm_mbox_copy_out(nn, skb);
+	else
+		nfp_ccm_mbox_mark_all_err(nn, skb, -EIO);
+exit_unlock_wake:
+	nn_ctrl_bar_unlock(nn);
+	wake_up_all(&nn->mbox_cmsg.wq);
+}
+
+int nfp_ccm_mbox_post(struct nfp_net *nn, struct sk_buff *skb,
+		      enum nfp_ccm_type type, unsigned int max_reply_size)
+{
+	int err;
+
+	err = nfp_ccm_mbox_msg_prepare(nn, skb, type, 0, max_reply_size,
+				       GFP_ATOMIC);
+	if (err)
+		goto err_free_skb;
+
+	nfp_ccm_mbox_mark_posted(skb);
+
+	spin_lock_bh(&nn->mbox_cmsg.queue.lock);
+
+	err = nfp_ccm_mbox_msg_enqueue(nn, skb, type);
+	if (err)
+		goto err_unlock;
+
+	if (nfp_ccm_mbox_is_first(nn, skb)) {
+		if (nn_ctrl_bar_trylock(nn)) {
+			nfp_ccm_mbox_copy_in(nn, skb);
+			nfp_net_mbox_reconfig_post(nn,
+						   NFP_NET_CFG_MBOX_CMD_TLV_CMSG);
+			queue_work(nn->mbox_cmsg.workq,
+				   &nn->mbox_cmsg.wait_work);
+		} else {
+			nfp_ccm_mbox_mark_next_runner(nn);
+		}
+	}
+
+	spin_unlock_bh(&nn->mbox_cmsg.queue.lock);
+
+	return 0;
+
+err_unlock:
+	spin_unlock_bh(&nn->mbox_cmsg.queue.lock);
+err_free_skb:
+	dev_kfree_skb_any(skb);
+	return err;
+}
+
 struct sk_buff *
-nfp_ccm_mbox_alloc(struct nfp_net *nn, unsigned int req_size,
-		   unsigned int reply_size, gfp_t flags)
+nfp_ccm_mbox_msg_alloc(struct nfp_net *nn, unsigned int req_size,
+		       unsigned int reply_size, gfp_t flags)
 {
 	unsigned int max_size;
 	struct sk_buff *skb;
@@ -589,3 +703,32 @@ bool nfp_ccm_mbox_fits(struct nfp_net *nn, unsigned int size)
 {
 	return nfp_ccm_mbox_max_msg(nn) >= size;
 }
+
+int nfp_ccm_mbox_init(struct nfp_net *nn)
+{
+	return 0;
+}
+
+void nfp_ccm_mbox_clean(struct nfp_net *nn)
+{
+	drain_workqueue(nn->mbox_cmsg.workq);
+}
+
+int nfp_ccm_mbox_alloc(struct nfp_net *nn)
+{
+	skb_queue_head_init(&nn->mbox_cmsg.queue);
+	init_waitqueue_head(&nn->mbox_cmsg.wq);
+	INIT_WORK(&nn->mbox_cmsg.wait_work, nfp_ccm_mbox_post_wait_work);
+	INIT_WORK(&nn->mbox_cmsg.runq_work, nfp_ccm_mbox_post_runq_work);
+
+	nn->mbox_cmsg.workq = alloc_workqueue("nfp-ccm-mbox", WQ_UNBOUND, 0);
+	if (!nn->mbox_cmsg.workq)
+		return -ENOMEM;
+	return 0;
+}
+
+void nfp_ccm_mbox_free(struct nfp_net *nn)
+{
+	destroy_workqueue(nn->mbox_cmsg.workq);
+	WARN_ON(!skb_queue_empty(&nn->mbox_cmsg.queue));
+}