-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'blackhole-device-to-invalidate-dst'
Mahesh Bandewar says: ==================== blackhole device to invalidate dst When we invalidate dst or mark it "dead", we assign 'lo' to dst->dev. First of all this assignment is racy and more over, it has MTU implications. The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP code when dereferencing the dst don't check if the dst is valid or not. TCP when dereferencing a dead-dst while negotiating a new connection, may use dst device which is 'lo' instead of using the correct device. Consider the following scenario: A SYN arrives on an interface and tcp-layer while processing SYNACK finds a dst and associates it with SYNACK skb. Now before skb gets passed to L3 for processing, if that dst gets "dead" (because of the virtual device getting disappeared & then reappeared), the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume the SYN has ADV_MSS set as 9k while the output device through which this SYNACK is going to go out has standard MTU of 1500. The MTU check during the route check passes since MIN(9K, 64K) is 9k and TCP successfully negotiates 9k MSS. The subsequent data packet; bigger in size gets passed to the device and it won't be marked as GSO since the assumed MTU of the device is 9k. This either crashes the NIC and we have seen fixes that went into drivers to handle this scenario. 8914a59 ('bnx2x: disable GSO where gso_size is too big for hardware') and 2b16f04 ('net: create skb_gso_validate_mac_len()') and with those fixes TCP eventually recovers but not before few dropped segments. Well, I'm not a TCP expert and though we have experienced these corner cases in our environment, I could not reproduce this case reliably in my test setup to try this fix myself. However, Michael Chan <michael.chan@broadcom.com> had a setup where these fixes helped him mitigate the issue and not cause the crash. The idea here is to not alter the data-path with additional locks or smb()/rmb() barriers to avoid racy assignments but to create a new device that has really low MTU that has .ndo_start_xmit essentially a kfree_skb(). Make use of this device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. v1->v2 fixed the self-test patch to handle the conflict v2 -> v3 fixed Kconfig text/string. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
- Loading branch information
Showing
11 changed files
with
195 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
// SPDX-License-Identifier: GPL-2.0 | ||
/* | ||
* This module tests the blackhole_dev that is created during the | ||
* net subsystem initialization. The test this module performs is | ||
* by injecting an skb into the stack with skb->dev as the | ||
* blackhole_dev and expects kernel to behave in a sane manner | ||
* (in other words, *not crash*)! | ||
* | ||
* Copyright (c) 2018, Mahesh Bandewar <maheshb@google.com> | ||
*/ | ||
|
||
#include <linux/init.h> | ||
#include <linux/module.h> | ||
#include <linux/printk.h> | ||
#include <linux/skbuff.h> | ||
#include <linux/netdevice.h> | ||
#include <linux/udp.h> | ||
#include <linux/ipv6.h> | ||
|
||
#include <net/dst.h> | ||
|
||
#define SKB_SIZE 256 | ||
#define HEAD_SIZE (14+40+8) /* Ether + IPv6 + UDP */ | ||
#define TAIL_SIZE 32 /* random tail-room */ | ||
|
||
#define UDP_PORT 1234 | ||
|
||
static int __init test_blackholedev_init(void) | ||
{ | ||
struct ipv6hdr *ip6h; | ||
struct sk_buff *skb; | ||
struct ethhdr *ethh; | ||
struct udphdr *uh; | ||
int data_len; | ||
int ret; | ||
|
||
skb = alloc_skb(SKB_SIZE, GFP_KERNEL); | ||
if (!skb) | ||
return -ENOMEM; | ||
|
||
/* Reserve head-room for the headers */ | ||
skb_reserve(skb, HEAD_SIZE); | ||
|
||
/* Add data to the skb */ | ||
data_len = SKB_SIZE - (HEAD_SIZE + TAIL_SIZE); | ||
memset(__skb_put(skb, data_len), 0xf, data_len); | ||
|
||
/* Add protocol data */ | ||
/* (Transport) UDP */ | ||
uh = (struct udphdr *)skb_push(skb, sizeof(struct udphdr)); | ||
skb_set_transport_header(skb, 0); | ||
uh->source = uh->dest = htons(UDP_PORT); | ||
uh->len = htons(data_len); | ||
uh->check = 0; | ||
/* (Network) IPv6 */ | ||
ip6h = (struct ipv6hdr *)skb_push(skb, sizeof(struct ipv6hdr)); | ||
skb_set_network_header(skb, 0); | ||
ip6h->hop_limit = 32; | ||
ip6h->payload_len = data_len + sizeof(struct udphdr); | ||
ip6h->nexthdr = IPPROTO_UDP; | ||
ip6h->saddr = in6addr_loopback; | ||
ip6h->daddr = in6addr_loopback; | ||
/* Ether */ | ||
ethh = (struct ethhdr *)skb_push(skb, sizeof(struct ethhdr)); | ||
skb_set_mac_header(skb, 0); | ||
|
||
skb->protocol = htons(ETH_P_IPV6); | ||
skb->pkt_type = PACKET_HOST; | ||
skb->dev = blackhole_netdev; | ||
|
||
/* Now attempt to send the packet */ | ||
ret = dev_queue_xmit(skb); | ||
|
||
switch (ret) { | ||
case NET_XMIT_SUCCESS: | ||
pr_warn("dev_queue_xmit() returned NET_XMIT_SUCCESS\n"); | ||
break; | ||
case NET_XMIT_DROP: | ||
pr_warn("dev_queue_xmit() returned NET_XMIT_DROP\n"); | ||
break; | ||
case NET_XMIT_CN: | ||
pr_warn("dev_queue_xmit() returned NET_XMIT_CN\n"); | ||
break; | ||
default: | ||
pr_err("dev_queue_xmit() returned UNKNOWN(%d)\n", ret); | ||
} | ||
|
||
return 0; | ||
} | ||
|
||
static void __exit test_blackholedev_exit(void) | ||
{ | ||
pr_warn("test_blackholedev module terminating.\n"); | ||
} | ||
|
||
module_init(test_blackholedev_init); | ||
module_exit(test_blackholedev_exit); | ||
|
||
MODULE_AUTHOR("Mahesh Bandewar <maheshb@google.com>"); | ||
MODULE_LICENSE("GPL"); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/sh | ||
# SPDX-License-Identifier: GPL-2.0 | ||
# Runs blackhole-dev test using blackhole-dev kernel module | ||
|
||
if /sbin/modprobe -q test_blackhole_dev ; then | ||
/sbin/modprobe -q -r test_blackhole_dev; | ||
echo "test_blackhole_dev: ok"; | ||
else | ||
echo "test_blackhole_dev: [FAIL]"; | ||
exit 1; | ||
fi |