Skip to content

Commit

Permalink
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/…
Browse files Browse the repository at this point in the history
…davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (183 commits)
  [TG3]: Update version to 3.78.
  [TG3]: Add missing NVRAM strapping.
  [TG3]: Enable auto MDI.
  [TG3]: Fix the polarity bit.
  [TG3]: Fix irq_sync race condition.
  [NET_SCHED]: ematch: module autoloading
  [TCP]: tcp probe wraparound handling and other changes
  [RTNETLINK]: rtnl_link: allow specifying initial device address
  [RTNETLINK]: rtnl_link API simplification
  [VLAN]: Fix MAC address handling
  [ETH]: Validate address in eth_mac_addr
  [NET]: Fix races in net_rx_action vs netpoll.
  [AF_UNIX]: Rewrite garbage collector, fixes race.
  [NETFILTER]: {ip, nf}_conntrack_sctp: fix remotely triggerable NULL ptr dereference (CVE-2007-2876)
  [NET]: Make all initialized struct seq_operations const.
  [UDP]: Fix length check.
  [IPV6]: Remove unneeded pointer idev from addrconf_cleanup().
  [DECNET]: Another unnecessary net/tcp.h inclusion in net/dn.h
  [IPV6]: Make IPV6_{RECV,2292}RTHDR boolean options.
  [IPV6]: Do not send RH0 anymore.
  ...

Fixed up trivial conflict in Documentation/feature-removal-schedule.txt
manually.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
Linus Torvalds committed Jul 12, 2007
2 parents 0b9062f + 15028aa commit e1bd2ac
Show file tree
Hide file tree
Showing 412 changed files with 11,191 additions and 7,282 deletions.
27 changes: 8 additions & 19 deletions Documentation/feature-removal-schedule.txt
Original file line number Diff line number Diff line change
Expand Up @@ -262,25 +262,6 @@ Who: Richard Purdie <rpurdie@rpsys.net>

---------------------------

What: Multipath cached routing support in ipv4
When: in 2.6.23
Why: Code was merged, then submitter immediately disappeared leaving
us with no maintainer and lots of bugs. The code should not have
been merged in the first place, and many aspects of it's
implementation are blocking more critical core networking
development. It's marked EXPERIMENTAL and no distribution
enables it because it cause obscure crashes due to unfixable bugs
(interfaces don't return errors so memory allocation can't be
handled, calling contexts of these interfaces make handling
errors impossible too because they get called after we've
totally commited to creating a route object, for example).
This problem has existed for years and no forward progress
has ever been made, and nobody steps up to try and salvage
this code, so we're going to finally just get rid of it.
Who: David S. Miller <davem@davemloft.net>

---------------------------

What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer)
When: December 2007
Why: These functions are a leftover from 2.4 times. They have several
Expand Down Expand Up @@ -337,3 +318,11 @@ Who: Jean Delvare <khali@linux-fr.org>

---------------------------

What: iptables SAME target
When: 1.1. 2008
Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h
Why: Obsolete for multiple years now, NAT core provides the same behaviour.
Unfixable broken wrt. 32/64 bit cleanness.
Who: Patrick McHardy <kaber@trash.net>

---------------------------
3 changes: 1 addition & 2 deletions Documentation/networking/ip-sysctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -874,8 +874,7 @@ accept_redirects - BOOLEAN
accept_source_route - INTEGER
Accept source routing (routing extension header).

> 0: Accept routing header.
= 0: Accept only routing header type 2.
>= 0: Accept only routing header type 2.
< 0: Do not accept routing header.

Default: 0
Expand Down
169 changes: 169 additions & 0 deletions Documentation/networking/l2tp.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
This brief document describes how to use the kernel's PPPoL2TP driver
to provide L2TP functionality. L2TP is a protocol that tunnels one or
more PPP sessions over a UDP tunnel. It is commonly used for VPNs
(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
network infrastructure.

Design
======

The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
which PPP frames carried through an L2TP session are passed through
the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
PPP interaction with the peer. PPP network interfaces are created for
each local PPP endpoint.

The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
control and data frames. L2TP control frames carry messages between
L2TP clients/servers and are used to setup / teardown tunnels and
sessions. An L2TP client or server is implemented in userspace and
will use a regular UDP socket per tunnel. L2TP data frames carry PPP
frames, which may be PPP control or PPP data. The kernel's PPP
subsystem arranges for PPP control frames to be delivered to pppd,
while data frames are forwarded as usual.

Each tunnel and session within a tunnel is assigned a unique tunnel_id
and session_id. These ids are carried in the L2TP header of every
control and data packet. The pppol2tp driver uses them to lookup
internal tunnel and/or session contexts. Zero tunnel / session ids are
treated specially - zero ids are never assigned to tunnels or sessions
in the network. In the driver, the tunnel context keeps a pointer to
the tunnel UDP socket. The session context keeps a pointer to the
PPPoL2TP socket, as well as other data that lets the driver interface
to the kernel PPP subsystem.

Note that the pppol2tp kernel driver handles only L2TP data frames;
L2TP control frames are simply passed up to userspace in the UDP
tunnel socket. The kernel handles all datapath aspects of the
protocol, including data packet resequencing (if enabled).

There are a number of requirements on the userspace L2TP daemon in
order to use the pppol2tp driver.

1. Use a UDP socket per tunnel.

2. Create a single PPPoL2TP socket per tunnel bound to a special null
session id. This is used only for communicating with the driver but
must remain open while the tunnel is active. Opening this tunnel
management socket causes the driver to mark the tunnel socket as an
L2TP UDP encapsulation socket and flags it for use by the
referenced tunnel id. This hooks up the UDP receive path via
udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
in this special PPPoX socket.

3. Create a PPPoL2TP socket per L2TP session. This is typically done
by starting pppd with the pppol2tp plugin and appropriate
arguments. A PPPoL2TP tunnel management socket (Step 2) must be
created before the first PPPoL2TP session socket is created.

When creating PPPoL2TP sockets, the application provides information
to the driver about the socket in a socket connect() call. Source and
destination tunnel and session ids are provided, as well as the file
descriptor of a UDP socket. See struct pppol2tp_addr in
include/linux/if_ppp.h. Note that zero tunnel / session ids are
treated specially. When creating the per-tunnel PPPoL2TP management
socket in Step 2 above, zero source and destination session ids are
specified, which tells the driver to prepare the supplied UDP file
descriptor for use as an L2TP tunnel socket.

Userspace may control behavior of the tunnel or session using
setsockopt and ioctl on the PPPoX socket. The following socket
options are supported:-

DEBUG - bitmask of debug message categories. See below.
SENDSEQ - 0 => don't send packets with sequence numbers
1 => send packets with sequence numbers
RECVSEQ - 0 => receive packet sequence numbers are optional
1 => drop receive packets without sequence numbers
LNSMODE - 0 => act as LAC.
1 => act as LNS.
REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.

Only the DEBUG option is supported by the special tunnel management
PPPoX socket.

In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
to retrieve tunnel and session statistics from the kernel using the
PPPoX socket of the appropriate tunnel or session.

Debugging
=========

The driver supports a flexible debug scheme where kernel trace
messages may be optionally enabled per tunnel and per session. Care is
needed when debugging a live system since the messages are not
rate-limited and a busy system could be swamped. Userspace uses
setsockopt on the PPPoX socket to set a debug mask.

The following debug mask bits are available:

PPPOL2TP_MSG_DEBUG verbose debug (if compiled in)
PPPOL2TP_MSG_CONTROL userspace - kernel interface
PPPOL2TP_MSG_SEQ sequence numbers handling
PPPOL2TP_MSG_DATA data packets

Sample Userspace Code
=====================

1. Create tunnel management PPPoX socket

kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
if (kernel_fd >= 0) {
struct sockaddr_pppol2tp sax;
struct sockaddr_in const *peer_addr;

peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
memset(&sax, 0, sizeof(sax));
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
sax.pppol2tp.addr.sin_family = AF_INET;
sax.pppol2tp.s_tunnel = tunnel_id;
sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
sax.pppol2tp.d_tunnel = 0;
sax.pppol2tp.d_session = 0; /* special case: mgmt socket */

if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
perror("connect failed");
result = -errno;
goto err;
}
}

2. Create session PPPoX data socket

struct sockaddr_pppol2tp sax;
int fd;

/* Note, the target socket must be bound already, else it will not be ready */
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = tunnel_fd;
sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
sax.pppol2tp.addr.sin_port = addr->sin_port;
sax.pppol2tp.addr.sin_family = AF_INET;
sax.pppol2tp.s_tunnel = tunnel_id;
sax.pppol2tp.s_session = session_id;
sax.pppol2tp.d_tunnel = peer_tunnel_id;
sax.pppol2tp.d_session = peer_session_id;

/* session_fd is the fd of the session's PPPoL2TP socket.
* tunnel_fd is the fd of the tunnel UDP socket.
*/
fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
if (fd < 0 ) {
return -errno;
}
return 0;

Miscellanous
============

The PPPoL2TP driver was developed as part of the OpenL2TP project by
Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
designed from the ground up to have the L2TP datapath in the
kernel. The project also implemented the pppol2tp plugin for pppd
which allows pppd to use the kernel driver. Details can be found at
http://openl2tp.sourceforge.net.
111 changes: 111 additions & 0 deletions Documentation/networking/multiqueue.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@

HOWTO for multiqueue network device support
===========================================

Section 1: Base driver requirements for implementing multiqueue support
Section 2: Qdisc support for multiqueue devices
Section 3: Brief howto using PRIO or RR for multiqueue devices


Intro: Kernel support for multiqueue devices
---------------------------------------------------------

Kernel support for multiqueue devices is only an API that is presented to the
netdevice layer for base drivers to implement. This feature is part of the
core networking stack, and all network devices will be running on the
multiqueue-aware stack. If a base driver only has one queue, then these
changes are transparent to that driver.


Section 1: Base driver requirements for implementing multiqueue support
-----------------------------------------------------------------------

Base drivers are required to use the new alloc_etherdev_mq() or
alloc_netdev_mq() functions to allocate the subqueues for the device. The
underlying kernel API will take care of the allocation and deallocation of
the subqueue memory, as well as netdev configuration of where the queues
exist in memory.

The base driver will also need to manage the queues as it does the global
netdev->queue_lock today. Therefore base drivers should use the
netif_{start|stop|wake}_subqueue() functions to manage each queue while the
device is still operational. netdev->queue_lock is still used when the device
comes online or when it's completely shut down (unregister_netdev(), etc.).

Finally, the base driver should indicate that it is a multiqueue device. The
feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
bitmap on device initialization. Below is an example from e1000:

#ifdef CONFIG_E1000_MQ
if ( (adapter->hw.mac.type == e1000_82571) ||
(adapter->hw.mac.type == e1000_82572) ||
(adapter->hw.mac.type == e1000_80003es2lan))
netdev->features |= NETIF_F_MULTI_QUEUE;
#endif


Section 2: Qdisc support for multiqueue devices
-----------------------------------------------

Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
bands and queues, and will store the queue mapping into skb->queue_mapping.
Use this field in the base driver to determine which queue to send the skb
to.

sch_rr has been added for hardware that doesn't want scheduling policies from
software, so it's a straight round-robin qdisc. It uses the same syntax and
classification priomap that sch_prio uses, so it should be intuitive to
configure for people who've used sch_prio.

The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
bands requested is equal to the number of queues on the hardware. If they
are equal, it sets a one-to-one mapping up between the queues and bands. If
they're not equal, it will not load the qdisc. This is the same behavior
for RR. Once the association is made, any skb that is classified will have
skb->queue_mapping set, which will allow the driver to properly queue skb's
to multiple queues.


Section 3: Brief howto using PRIO and RR for multiqueue devices
---------------------------------------------------------------

The userspace command 'tc,' part of the iproute2 package, is used to configure
qdiscs. To add the PRIO qdisc to your network device, assuming the device is
called eth0, run the following command:

# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue

This will create 4 bands, 0 being highest priority, and associate those bands
to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
would look like:

band 0 => queue 0
band 1 => queue 1
band 2 => queue 2
band 3 => queue 3

Traffic will begin flowing through each queue if your TOS values are assigning
traffic across the various bands. For example, ssh traffic will always try to
go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
traffic classification, which is band 1. Therefore pings will be send out
queue 1 on the NIC.

Note the use of the multiqueue keyword. This is only in versions of iproute2
that support multiqueue networking devices; if this is omitted when loading
a qdisc onto a multiqueue device, the qdisc will load and operate the same
if it were loaded onto a single-queue device (i.e. - sends all traffic to
queue 0).

Another alternative to multiqueue band allocation can be done by using the
multiqueue option and specify 0 bands. If this is the case, the qdisc will
allocate the number of bands to equal the number of queues that the device
reports, and bring the qdisc online.

The behavior of tc filters remains the same, where it will override TOS priority
classification.


Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
38 changes: 32 additions & 6 deletions Documentation/networking/netdevices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
separately allocated data is attached to the network device
(dev->priv) then it is up to the module exit handler to free that.

MTU
===
Each network device has a Maximum Transfer Unit. The MTU does not
include any link layer protocol overhead. Upper layer protocols must
not pass a socket buffer (skb) to a device to transmit with more data
than the mtu. The MTU does not include link layer header overhead, so
for example on Ethernet if the standard MTU is 1500 bytes used, the
actual skb will contain up to 1514 bytes because of the Ethernet
header. Devices should allow for the 4 byte VLAN header as well.

Segmentation Offload (GSO, TSO) is an exception to this rule. The
upper layer protocol may pass a large socket buffer to the device
transmit routine, and the device will break that up into separate
packets based on the current MTU.

MTU is symmetrical and applies both to receive and transmit. A device
must be able to receive at least the maximum size packet allowed by
the MTU. A network device may use the MTU as mechanism to size receive
buffers, but the device should allow packets with VLAN header. With
standard Ethernet mtu of 1500 bytes, the device should allow up to
1518 byte packets (1500 + 14 header + 4 tag). The device may either:
drop, truncate, or pass up oversize packets, but dropping oversize
packets is preferred.


struct net_device synchronization rules
=======================================
Expand All @@ -43,16 +67,17 @@ dev->get_stats:

dev->hard_start_xmit:
Synchronization: netif_tx_lock spinlock.

When the driver sets NETIF_F_LLTX in dev->features this will be
called without holding netif_tx_lock. In this case the driver
has to lock by itself when needed. It is recommended to use a try lock
for this and return -1 when the spin lock fails.
for this and return NETDEV_TX_LOCKED when the spin lock fails.
The locking there should also properly protect against
set_multicast_list
Context: Process with BHs disabled or BH (timer).
Notes: netif_queue_stopped() is guaranteed false
Interrupts must be enabled when calling hard_start_xmit.
(Interrupts must also be enabled when enabling the BH handler.)
set_multicast_list.

Context: Process with BHs disabled or BH (timer),
will be called with interrupts disabled by netconsole.

Return codes:
o NETDEV_TX_OK everything ok.
o NETDEV_TX_BUSY Cannot transmit packet, try later
Expand All @@ -74,4 +99,5 @@ dev->poll:
Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See
dev_close code and comments in net/core/dev.c for more info.
Context: softirq
will be called with interrupts disabled by netconsole.

Loading

0 comments on commit e1bd2ac

Please sign in to comment.