-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/…
…davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (183 commits) [TG3]: Update version to 3.78. [TG3]: Add missing NVRAM strapping. [TG3]: Enable auto MDI. [TG3]: Fix the polarity bit. [TG3]: Fix irq_sync race condition. [NET_SCHED]: ematch: module autoloading [TCP]: tcp probe wraparound handling and other changes [RTNETLINK]: rtnl_link: allow specifying initial device address [RTNETLINK]: rtnl_link API simplification [VLAN]: Fix MAC address handling [ETH]: Validate address in eth_mac_addr [NET]: Fix races in net_rx_action vs netpoll. [AF_UNIX]: Rewrite garbage collector, fixes race. [NETFILTER]: {ip, nf}_conntrack_sctp: fix remotely triggerable NULL ptr dereference (CVE-2007-2876) [NET]: Make all initialized struct seq_operations const. [UDP]: Fix length check. [IPV6]: Remove unneeded pointer idev from addrconf_cleanup(). [DECNET]: Another unnecessary net/tcp.h inclusion in net/dn.h [IPV6]: Make IPV6_{RECV,2292}RTHDR boolean options. [IPV6]: Do not send RH0 anymore. ... Fixed up trivial conflict in Documentation/feature-removal-schedule.txt manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Loading branch information
Showing
412 changed files
with
11,191 additions
and
7,282 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
This brief document describes how to use the kernel's PPPoL2TP driver | ||
to provide L2TP functionality. L2TP is a protocol that tunnels one or | ||
more PPP sessions over a UDP tunnel. It is commonly used for VPNs | ||
(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP | ||
network infrastructure. | ||
|
||
Design | ||
====== | ||
|
||
The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by | ||
which PPP frames carried through an L2TP session are passed through | ||
the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all | ||
PPP interaction with the peer. PPP network interfaces are created for | ||
each local PPP endpoint. | ||
|
||
The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP | ||
control and data frames. L2TP control frames carry messages between | ||
L2TP clients/servers and are used to setup / teardown tunnels and | ||
sessions. An L2TP client or server is implemented in userspace and | ||
will use a regular UDP socket per tunnel. L2TP data frames carry PPP | ||
frames, which may be PPP control or PPP data. The kernel's PPP | ||
subsystem arranges for PPP control frames to be delivered to pppd, | ||
while data frames are forwarded as usual. | ||
|
||
Each tunnel and session within a tunnel is assigned a unique tunnel_id | ||
and session_id. These ids are carried in the L2TP header of every | ||
control and data packet. The pppol2tp driver uses them to lookup | ||
internal tunnel and/or session contexts. Zero tunnel / session ids are | ||
treated specially - zero ids are never assigned to tunnels or sessions | ||
in the network. In the driver, the tunnel context keeps a pointer to | ||
the tunnel UDP socket. The session context keeps a pointer to the | ||
PPPoL2TP socket, as well as other data that lets the driver interface | ||
to the kernel PPP subsystem. | ||
|
||
Note that the pppol2tp kernel driver handles only L2TP data frames; | ||
L2TP control frames are simply passed up to userspace in the UDP | ||
tunnel socket. The kernel handles all datapath aspects of the | ||
protocol, including data packet resequencing (if enabled). | ||
|
||
There are a number of requirements on the userspace L2TP daemon in | ||
order to use the pppol2tp driver. | ||
|
||
1. Use a UDP socket per tunnel. | ||
|
||
2. Create a single PPPoL2TP socket per tunnel bound to a special null | ||
session id. This is used only for communicating with the driver but | ||
must remain open while the tunnel is active. Opening this tunnel | ||
management socket causes the driver to mark the tunnel socket as an | ||
L2TP UDP encapsulation socket and flags it for use by the | ||
referenced tunnel id. This hooks up the UDP receive path via | ||
udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed | ||
in this special PPPoX socket. | ||
|
||
3. Create a PPPoL2TP socket per L2TP session. This is typically done | ||
by starting pppd with the pppol2tp plugin and appropriate | ||
arguments. A PPPoL2TP tunnel management socket (Step 2) must be | ||
created before the first PPPoL2TP session socket is created. | ||
|
||
When creating PPPoL2TP sockets, the application provides information | ||
to the driver about the socket in a socket connect() call. Source and | ||
destination tunnel and session ids are provided, as well as the file | ||
descriptor of a UDP socket. See struct pppol2tp_addr in | ||
include/linux/if_ppp.h. Note that zero tunnel / session ids are | ||
treated specially. When creating the per-tunnel PPPoL2TP management | ||
socket in Step 2 above, zero source and destination session ids are | ||
specified, which tells the driver to prepare the supplied UDP file | ||
descriptor for use as an L2TP tunnel socket. | ||
|
||
Userspace may control behavior of the tunnel or session using | ||
setsockopt and ioctl on the PPPoX socket. The following socket | ||
options are supported:- | ||
|
||
DEBUG - bitmask of debug message categories. See below. | ||
SENDSEQ - 0 => don't send packets with sequence numbers | ||
1 => send packets with sequence numbers | ||
RECVSEQ - 0 => receive packet sequence numbers are optional | ||
1 => drop receive packets without sequence numbers | ||
LNSMODE - 0 => act as LAC. | ||
1 => act as LNS. | ||
REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. | ||
|
||
Only the DEBUG option is supported by the special tunnel management | ||
PPPoX socket. | ||
|
||
In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided | ||
to retrieve tunnel and session statistics from the kernel using the | ||
PPPoX socket of the appropriate tunnel or session. | ||
|
||
Debugging | ||
========= | ||
|
||
The driver supports a flexible debug scheme where kernel trace | ||
messages may be optionally enabled per tunnel and per session. Care is | ||
needed when debugging a live system since the messages are not | ||
rate-limited and a busy system could be swamped. Userspace uses | ||
setsockopt on the PPPoX socket to set a debug mask. | ||
|
||
The following debug mask bits are available: | ||
|
||
PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) | ||
PPPOL2TP_MSG_CONTROL userspace - kernel interface | ||
PPPOL2TP_MSG_SEQ sequence numbers handling | ||
PPPOL2TP_MSG_DATA data packets | ||
|
||
Sample Userspace Code | ||
===================== | ||
|
||
1. Create tunnel management PPPoX socket | ||
|
||
kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); | ||
if (kernel_fd >= 0) { | ||
struct sockaddr_pppol2tp sax; | ||
struct sockaddr_in const *peer_addr; | ||
|
||
peer_addr = l2tp_tunnel_get_peer_addr(tunnel); | ||
memset(&sax, 0, sizeof(sax)); | ||
sax.sa_family = AF_PPPOX; | ||
sax.sa_protocol = PX_PROTO_OL2TP; | ||
sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ | ||
sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; | ||
sax.pppol2tp.addr.sin_port = peer_addr->sin_port; | ||
sax.pppol2tp.addr.sin_family = AF_INET; | ||
sax.pppol2tp.s_tunnel = tunnel_id; | ||
sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ | ||
sax.pppol2tp.d_tunnel = 0; | ||
sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ | ||
|
||
if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { | ||
perror("connect failed"); | ||
result = -errno; | ||
goto err; | ||
} | ||
} | ||
|
||
2. Create session PPPoX data socket | ||
|
||
struct sockaddr_pppol2tp sax; | ||
int fd; | ||
|
||
/* Note, the target socket must be bound already, else it will not be ready */ | ||
sax.sa_family = AF_PPPOX; | ||
sax.sa_protocol = PX_PROTO_OL2TP; | ||
sax.pppol2tp.fd = tunnel_fd; | ||
sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; | ||
sax.pppol2tp.addr.sin_port = addr->sin_port; | ||
sax.pppol2tp.addr.sin_family = AF_INET; | ||
sax.pppol2tp.s_tunnel = tunnel_id; | ||
sax.pppol2tp.s_session = session_id; | ||
sax.pppol2tp.d_tunnel = peer_tunnel_id; | ||
sax.pppol2tp.d_session = peer_session_id; | ||
|
||
/* session_fd is the fd of the session's PPPoL2TP socket. | ||
* tunnel_fd is the fd of the tunnel UDP socket. | ||
*/ | ||
fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); | ||
if (fd < 0 ) { | ||
return -errno; | ||
} | ||
return 0; | ||
|
||
Miscellanous | ||
============ | ||
|
||
The PPPoL2TP driver was developed as part of the OpenL2TP project by | ||
Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, | ||
designed from the ground up to have the L2TP datapath in the | ||
kernel. The project also implemented the pppol2tp plugin for pppd | ||
which allows pppd to use the kernel driver. Details can be found at | ||
http://openl2tp.sourceforge.net. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
|
||
HOWTO for multiqueue network device support | ||
=========================================== | ||
|
||
Section 1: Base driver requirements for implementing multiqueue support | ||
Section 2: Qdisc support for multiqueue devices | ||
Section 3: Brief howto using PRIO or RR for multiqueue devices | ||
|
||
|
||
Intro: Kernel support for multiqueue devices | ||
--------------------------------------------------------- | ||
|
||
Kernel support for multiqueue devices is only an API that is presented to the | ||
netdevice layer for base drivers to implement. This feature is part of the | ||
core networking stack, and all network devices will be running on the | ||
multiqueue-aware stack. If a base driver only has one queue, then these | ||
changes are transparent to that driver. | ||
|
||
|
||
Section 1: Base driver requirements for implementing multiqueue support | ||
----------------------------------------------------------------------- | ||
|
||
Base drivers are required to use the new alloc_etherdev_mq() or | ||
alloc_netdev_mq() functions to allocate the subqueues for the device. The | ||
underlying kernel API will take care of the allocation and deallocation of | ||
the subqueue memory, as well as netdev configuration of where the queues | ||
exist in memory. | ||
|
||
The base driver will also need to manage the queues as it does the global | ||
netdev->queue_lock today. Therefore base drivers should use the | ||
netif_{start|stop|wake}_subqueue() functions to manage each queue while the | ||
device is still operational. netdev->queue_lock is still used when the device | ||
comes online or when it's completely shut down (unregister_netdev(), etc.). | ||
|
||
Finally, the base driver should indicate that it is a multiqueue device. The | ||
feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | ||
bitmap on device initialization. Below is an example from e1000: | ||
|
||
#ifdef CONFIG_E1000_MQ | ||
if ( (adapter->hw.mac.type == e1000_82571) || | ||
(adapter->hw.mac.type == e1000_82572) || | ||
(adapter->hw.mac.type == e1000_80003es2lan)) | ||
netdev->features |= NETIF_F_MULTI_QUEUE; | ||
#endif | ||
|
||
|
||
Section 2: Qdisc support for multiqueue devices | ||
----------------------------------------------- | ||
|
||
Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | ||
sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | ||
bands and queues, and will store the queue mapping into skb->queue_mapping. | ||
Use this field in the base driver to determine which queue to send the skb | ||
to. | ||
|
||
sch_rr has been added for hardware that doesn't want scheduling policies from | ||
software, so it's a straight round-robin qdisc. It uses the same syntax and | ||
classification priomap that sch_prio uses, so it should be intuitive to | ||
configure for people who've used sch_prio. | ||
|
||
The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been | ||
built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | ||
bands requested is equal to the number of queues on the hardware. If they | ||
are equal, it sets a one-to-one mapping up between the queues and bands. If | ||
they're not equal, it will not load the qdisc. This is the same behavior | ||
for RR. Once the association is made, any skb that is classified will have | ||
skb->queue_mapping set, which will allow the driver to properly queue skb's | ||
to multiple queues. | ||
|
||
|
||
Section 3: Brief howto using PRIO and RR for multiqueue devices | ||
--------------------------------------------------------------- | ||
|
||
The userspace command 'tc,' part of the iproute2 package, is used to configure | ||
qdiscs. To add the PRIO qdisc to your network device, assuming the device is | ||
called eth0, run the following command: | ||
|
||
# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | ||
|
||
This will create 4 bands, 0 being highest priority, and associate those bands | ||
to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | ||
would look like: | ||
|
||
band 0 => queue 0 | ||
band 1 => queue 1 | ||
band 2 => queue 2 | ||
band 3 => queue 3 | ||
|
||
Traffic will begin flowing through each queue if your TOS values are assigning | ||
traffic across the various bands. For example, ssh traffic will always try to | ||
go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | ||
so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | ||
traffic classification, which is band 1. Therefore pings will be send out | ||
queue 1 on the NIC. | ||
|
||
Note the use of the multiqueue keyword. This is only in versions of iproute2 | ||
that support multiqueue networking devices; if this is omitted when loading | ||
a qdisc onto a multiqueue device, the qdisc will load and operate the same | ||
if it were loaded onto a single-queue device (i.e. - sends all traffic to | ||
queue 0). | ||
|
||
Another alternative to multiqueue band allocation can be done by using the | ||
multiqueue option and specify 0 bands. If this is the case, the qdisc will | ||
allocate the number of bands to equal the number of queues that the device | ||
reports, and bring the qdisc online. | ||
|
||
The behavior of tc filters remains the same, where it will override TOS priority | ||
classification. | ||
|
||
|
||
Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.