Skip to content

Commit

Permalink
Merge branch 'net-smc'
Browse files Browse the repository at this point in the history
Ursula Braun says:

====================
net/smc: Shared Memory Communications - RDMA

here is now V4 of the SMC-R patches having processed your feedback from end
of November. The most important change is the replacement of sysfs by a
generic netlink solution in patch 04. And I tried to get rid of the __packed
attributes. There are still a few usages left due to SMC-R protocol defined
structures.

V4 changes:
The order of patches 03 and 04 for pnet table management and SMC IB-client
establishing has been exchanged, since pnet table management is now built on
top of smc_ib_devices.
Patch 01: Use EXPORT_SYMBOL_GPL().
Patch 02: Define "use_fallback" as bool.
          Get rid of useless smc_sock fields clearing in smc_sock_alloc(),
          since sk_alloc() clears out the memory.
Patch 03: Postpone smc_ib_remember_port_attr() call till ib_device is
          mentioned in the pnet table.
Patch 04: Replace sysfs-usage by a generic netlink approach for pnet table
          configuration.
          Change layout of pnet table entries to reference net_device and
          ib_device instead of dealing with names of net_devices and
          ib_devices.
Patch 05: Adapt "use_fallback" usages to new type bool.
          Get rid of useless smc_sock fields clearing in smc_sock_alloc()
          Avoid __packed where possible.
          Check if clc responses are not too big.
Patch 09: Postpone smc_setup_per_ibdev till the first connection with this
          ib_device is really created.
Patch 11: Get rid of __packed usage.

V3 changes:
Patch 05: Remove unneeded DEFINE_WAIT
Patch 06: Improve synchronization of link group creation
Patch 07: Rename peer_rmbe_len into peer_rmbe_size to be more consistent
Patch 09: Avoid calls of ib_get_memory_region with IB_ACCESS_LOCAL_WRITE,
          use new default local_dma_lkey from protection domain as lkey
          instead.
          Remove no longer needed function smc_ib_dereg_memory_region().
Patch 14: Switch to state ACTIVE only if still in state INIT.
          Return 0 for recvmsg invoked in a socket closing state.
          Allow getname call in state APPCLOSEWAIT1
          Do not trigger destruction of a socket-in-error queued in accept
          queue.
          During cleanup of accept queue, make sure sockets are destructed,
          and sockets in fallback mode are handled appropriately.
          When freeing sndbufs/rmbs, remove them from their list and free
          the entry.
          Use add_wait_queue() and remove_wait_queue() in close wait
          functions.
          If actively closing a socket in state for PEERFINCLOSEWAIT, keep
          this state.
          If passively closing a socket while bytes are to be received, move
          to state APPCLOSEWAIT1.
          If actively aborting a socket, skip sending the close_abort flag,
          since RDMA communication is no longer possible.
          When terminating a link group, do not schedule link group freeing a
          2nd time, since already done when unregistering the last remaining
          connection.
Patch 15: Introduce smc_diag module for monitoring SMC protocol sockets.
          This replaces the old patch 0015 dealing with procfs.

V2 changes:
Patch 0002: Add SMC versions for family key strings in net/core/sock.c.
Patch 0006: initialize rb_tree.
Patch 0007: Get rid of unneeded use of xchg() in smc_sndbuf_unuse() and
            smc_rmb_unuse().
Patch 0008: Correct error checking logic for ib_function calls.
            Define struct smc_link field wr_tx_id as atomic_long_t.
            Use "do_div" instead of "%" to be architecture-independent.
Patch 0009: Correct error checking logic for ib_function calls.
Patch 0011: Remove xchg() calls in cursor handling. Use atomic64_t for cursor
            overlays on 64-bit architectures. If not available, use plain u64
            and add locking for cursor reading and writing.
            Implement smc_curs_add() without modulo operator "%".
Patch 0012: Remove xchg() calls in cursor handling.
            Implement smc_tx_rdma_writes() without module operator "%".
Patch 0013: Remove xchg() calls in cursor handling.
Patch 0014: Return type bool in smc_wr_tx_has_pending().
            Remove unneeded semicolon in smc_close_shutdown_write().
            Call smc_close_active() in non-fallback case only.
            Get rid of duplicate schedule of sock_put_work().
            Take nested sock_lock in smc_listen_work().
            Start close stream_wait in case of prepared sends only.
Patch 0015: Remove unneeded socket ref_count in smc_proc_seq_show().
            Take lock before list_empty check in smc_proc_sock_list_del().

These patches are the initial part of the implementation of the
"Shared Memory Communications-RDMA" (SMC-R) protocol as defined in
RFC7609 [1]. While SMC-R does not aim to replace TCP,
it taps a wealth of existing data center TCP socket applications
to become more efficient without the need for rewriting them.
SMC-R uses RDMA over Converged Ethernet (RoCE) to save CPU consumption.
For instance, when running 10 parallel connections with uperf, we measured
a decrease of 60% in CPU consumption with SMC-R compared to TCP/IP
(with throughput and latency comparable;
measured on x86_64 with the same RoCE card and port).

SMC-R does not require an RDMA communication manager (RDMA CM).

SMC-R inherits TCP qualities such as reliable connections, host-based
firewall packet filtering (on connection establishment) and unmodified
application of communication encryption such as TLS (transport layer
security) or SSL (secure sockets layer). Since original TCP is used to
establish SMC-R connections, load balancers and packet inspection based
on TCP/IP connection establishment continue to work for SMC-R.

On the other hand, using SMC-R implies:
- either involving a preload library when invoking the unchanged TCP-application
  or slightly modifying the source by simply changing the socket family in
  the socket() call
- accepting extra overhead and latency in connection establishment due to
  SMC Connection Layer Control (CLC) handshake
- explicit coupling of RoCE ports with Ethernet ports
- not routable as currently built on RoCE V1
- bypassing of packet-based networking features
    - filtering (netfilter)
    - sniffing (libpcap, packet sockets, (E)BPF)
    - traffic control (scheduling, shaping)
- bypassing of IP-header based socket options
- bypassing of memory buffer (pressure) management
- unusable together with IPsec

Overview of the SMC-R Protocol described in informational RFC 7609

SMC-R is an open protocol that provides RDMA capabilities over RoCE
transparently for applications exploiting TCP sockets.
A new socket protocol family PF_SMC is introduced.
There are no changes required to applications using the sockets API for TCP
stream sockets other than the specification of the new socket family AF_SMC.
Unmodified applications can be used by means of a dynamic preload shared
library which rewrites the socket API call
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) into
socket(AF_SMC,  SOCK_STREAM, IPPROTO_TCP).
SMC-R re-uses the address family AF_INET for all addressing purposes around
struct sockaddr.

SMC-R system architecture layers:

+=============================================================================+
|                                      | unmodified TCP application           |
| native SMC application               +--------------------------------------+
|                                      | dynamic preload shared library       |
+=============================================================================+
|                                 SMC socket                                  |
+-----------------------------------------------------------------------------+
|                    | TCP socket (for connection establishment and fallback) |
| IB verbs           +--------------------------------------------------------+
|                    | IP                                                     |
+--------------------+--------------------------------------------------------+
| RoCE device driver | some network device driver                             |
+=============================================================================+

Terms:

A link group is determined by an ordered peer pair of TCP client and TCP server
(IP addresses and subnet). Reversed client server roles cause an own link group.
A link is a logical point-to-point connection based on an
infiniband reliable connected queue pair (RC-QP) between two RoCE ports
(MACs and GIDs) of a peer pair.
A link group can have 1..8 links for failover and load balancing.
This initial Linux implementation always has 1 link per link group.
Each link group on a peer can have 1..255 remote memory buffers (RMBs).
If more RMBs are needed, a peer can open another link group
(this initial Linux implementation) or fall back to TCP.
Each RMB has its own particular size and its own (R)DMA mapping and credentials
(rtoken consisting of rkey and RDMA "virtual address").
This initial Linux implementation uses physically contiguous memory for RMBs
but we are working towards scattered memory because of memory fragmentation.
Each RMB has 1..255 RMB elements (RMBEs) of equal size
to provide multiplexing of connections within an RMB.
An RMBE is the RDMA Write destination organized as wrapping ring buffer
for data transmit of a particular connection in one direction
(duplex by means of mirror symmetry as with TCP).
This initial Linux implementation always has 1 RMBE per RMB
and thus an individual RMB for each connection.

SMC-R connection establishment with subsequent data transfer:

   CLIENT                                                   SERVER

TCP three-way handshake:
                         regular TCP SYN
      -------------------------------------------------------->
                       regular TCP SYN ACK
      <--------------------------------------------------------
                         regular TCP ACK
      -------------------------------------------------------->

SMC Connection Layer Control (CLC) handshake
exchanges RDMA credentials between peers:
             via above TCP connection: SMC CLC Proposal
      -------------------------------------------------------->
              via above TCP connection: SMC CLC Accept
      <--------------------------------------------------------
             via above TCP connection: SMC CLC Confirm
      -------------------------------------------------------->

SMC Link Layer Control (LLC) (only once per link, i.e. 1st conn. of link group):
                 RoCE RC-QP: SMC LLC Confirm Link
      <========================================================
             RoCE RC-QP: SMC LLC Confirm Link response
      ========================================================>

SMC data transmission (incl. SMC Connection Data Control (CDC) message):
                       RoCE RC-QP: RDMA Write
      ========================================================>
             RoCE RC-QP: SMC CDC message (flow control)
      ========================================================>
                          ...

                       RoCE RC-QP: RDMA Write
      <========================================================
             RoCE RC-QP: SMC CDC message (flow control)
      <========================================================
                          ...

Data flow within an established connection:

+----------------------------------------------------------------------------
|            SENDER
| sendmsg()
|    |
|    | produces into sndbuf [sender's process context]
|    v
| +--------+
| | sndbuf | [ring buffer]
| +--------+
|    |
|    | consumes from sndbuf and produces into receiver's RMBE [any context]
|    | by sending RDMA Write followed by SMC CDC message over RoCE RC-QP
|    |
+----|-----------------------------------------------------------------------
     |
+----|-----------------------------------------------------------------------
|    v       RECEIVER
| +------+
| | RMBE | [ring buffer, can have size different from sender's sndbuf]
| |      | [RMBE represents rcvbuf, no further de-coupling as on sender side]
| +------+
|    |
|    | consumes from RMBE [receiver's process context]
|    v
| recvmsg()
+----------------------------------------------------------------------------

Flow control ("cursor" updates) by means of SMC CDC messages:

               SENDER                            RECEIVER

        sends updates via CDC-------------+   sends updates via CDC
        on consuming from sndbuf          |   on consuming from RMBE
        and producing into RMBE           |   by means of recvmsg()
                                          |            |
                                          |            |
      +-----------------------------------|------------+
      |                                   |
   +--v-------------------------+      +--v-----------------------+
   | receiver's consumer cursor |      | sender's producer cursor----+
   +----------------|-----------+      +--------------------------+  |
                    |                                                |
                    |                        receiver's RMBE         |
                    |                  +--------------------------+  |
                    |                  |                          |  |
                    +--------------------------------+            |  |
                                       |             |            |  |
                                       |             v            |  |
                                       |             +------------|  |
                                       |-------------+////////////|  |
                                       |//RDMA data written by////|  |
                                       |////sender that is////////|  |
                                       |/available to be consumed/|  |
                                       |///////// +---------------|  |
                                       |----------+^              |  |
                                       |           |              |  |
                                       |           +-----------------+
                                       |                          |
                                       +--------------------------+

Sending updates of the producer cursor is immediate for low latency;
something like Nagle's algorithm (absence of TCP_NODELAY) is optional and
currently not part of this initial Linux implementation.
Sending updates of the consumer cursor is conditional to avoid the
silly window syndrome.

Normal connection termination:

Normal connection termination starts transitioning from socket state
ACTIVE via either "Active Close" or "Passive Close".

shutdown rdwr               +-----------------+
or close,   +-------------->|  INIT / CLOSED  |<-------------+
send PeerCon|nClosed        +-----------------+              | PeerConnClosed
            |                       |                        | received
            |            connection | established            |
            |                       V                        |
    +----------------+     +-----------------+     +----------------+
    |AppFinCloseWait |     |     ACTIVE      |     |PeerFinCloseWait|
    +----------------+     +-----------------+     +----------------+
            |                   |         |                   |
            |     Active Close: |         |Passive Close:     |
            |     close or      |         |PeerConnClosed or  |
            |     shutdown wr or|         |PeerDoneWriting    |
            |     shutdown rdwr |         |received           |

    |                   V         V                   |
 PeerConnClo|sed    +--------------+   +-------------+        | close or
 received   +--<----|PeerCloseWait1|   |AppCloseWait1|--->----+ shutdown rdwr,
            |       +--------------+   +-------------+        | send
            |  PeerDoneWri|ting                | shutdown wr, | PeerConnClosed
            |  received   |            send Pee|rDoneWriting  |
            |             V                    V              |
            |       +--------------+   +-------------+        |
            +--<----|PeerCloseWait2|   |AppCloseWait2|--->----+
                    +--------------+   +-------------+

In state CLOSED, the socket can be destructed only, once the application has
issued a close().

Abnormal connection termination:

                            +-----------------+
            +-------------->|  INIT / CLOSED  |<-------------+
            |               +-----------------+              |
            |                                                |
            |           +-----------------------+            |
            |           |     Any state         |            |
 PeerConnAbo|rt         | (before setting       |            | send
 received   |           |  PeerConnClosed       |            | PeerConnAbort
            |           |  indicator in         |            |
            |           |  peer's RMBE)         |            |
            |           +-----------------------+            |
            |                   |         |                  |
            |     Active Abort: |         | Passive Abort:   |
            |     problem,      |         | PeerConnAbort    |
            |     send          |         | received,        |
            |     PeerConnAbort,|         | ECONNRESET       |
            |     ECONNABORTED  |         |                  |
            |                   V         V                  |
            |       +--------------+   +--------------+      |
            +-------|PeerAbortWait |   | ProcessAbort |------+
                    +--------------+   +--------------+

Implementation notes beyond RFC 7609:

A PNET table in sysfs provides the mapping between network device names and
RoCE Infiniband device names for the transparent switch of data communication.
A PNET table can contain an arbitrary number of PNETIDs.
Each PNETID contains exactly one (Ethernet) network device name
and one or more RoCE Infiniband device names.
Each device name can only exist in at most one PNETID (no overlapping).
This initial Linux implementation allows at most one RoCE Infiniband device
name per PNETID.
After a new TCP connection is established, the network device name
used for egress traffic with the TCP connection's local source IP address
is used as key to lookup the unique PNETID, and the RoCE Infiniband device
of this PNETID is used to switch data communication from TCP to RDMA
during SMC CLC handshake.

Problem determination:

A protocol dissector is available with upstream wireshark for formatting
SMC-R related RoCE LAN traffic.
[https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-smcr.c]

We are working on enhancing the Linux implementation to cover:

- Improve default socket closing asynchronicity
- Address corner cases with many parallel connections
- Tracing
- Integrated load balancing and fail-over within a link group
- Splice and sendpage support
- IPv6 addressing support
- Keepalive, Cork
- Namespaces support
- Urgent data
- More socket options
- Diagnostics
- Statistics support
- SNMP support

References:

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jan 9, 2017
2 parents c8584b3 + f16a7dd commit f3a3e24
Show file tree
Hide file tree
Showing 38 changed files with 7,127 additions and 9 deletions.
7 changes: 7 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -10850,6 +10850,13 @@ S: Maintained
F: drivers/staging/media/st-cec/
F: Documentation/devicetree/bindings/media/stih-cec.txt

SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS
M: Ursula Braun <ubraun@linux.vnet.ibm.com>
L: linux-s390@vger.kernel.org
W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
F: net/smc/

SYNOPSYS DESIGNWARE DMAC DRIVER
M: Viresh Kumar <vireshk@kernel.org>
M: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Expand Down
7 changes: 6 additions & 1 deletion include/linux/socket.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,12 @@ struct ucred {
#define AF_VSOCK 40 /* vSockets */
#define AF_KCM 41 /* Kernel Connection Multiplexor*/
#define AF_QIPCRTR 42 /* Qualcomm IPC Router */
#define AF_SMC 43 /* smc sockets: reserve number for
* PF_SMC protocol family that
* reuses AF_INET address family
*/

#define AF_MAX 43 /* For now.. */
#define AF_MAX 44 /* For now.. */

/* Protocol families, same as address families. */
#define PF_UNSPEC AF_UNSPEC
Expand Down Expand Up @@ -251,6 +255,7 @@ struct ucred {
#define PF_VSOCK AF_VSOCK
#define PF_KCM AF_KCM
#define PF_QIPCRTR AF_QIPCRTR
#define PF_SMC AF_SMC
#define PF_MAX AF_MAX

/* Maximum queue length specifiable by listen. */
Expand Down
20 changes: 20 additions & 0 deletions include/net/smc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
* Shared Memory Communications over RDMA (SMC-R) and RoCE
*
* Definitions for the SMC module (socket related)
*
* Copyright IBM Corp. 2016
*
* Author(s): Ursula Braun <ubraun@linux.vnet.ibm.com>
*/
#ifndef _SMC_H
#define _SMC_H

struct smc_hashinfo {
rwlock_t lock;
struct hlist_head ht;
};

int smc_hash_sk(struct sock *sk);
void smc_unhash_sk(struct sock *sk);
#endif /* _SMC_H */
4 changes: 4 additions & 0 deletions include/net/sock.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
#include <net/checksum.h>
#include <net/tcp_states.h>
#include <linux/net_tstamp.h>
#include <net/smc.h>

/*
* This structure really needs to be cleaned up.
Expand Down Expand Up @@ -986,6 +987,7 @@ struct request_sock_ops;
struct timewait_sock_ops;
struct inet_hashinfo;
struct raw_hashinfo;
struct smc_hashinfo;
struct module;

/*
Expand Down Expand Up @@ -1024,6 +1026,7 @@ struct proto {
int (*getsockopt)(struct sock *sk, int level,
int optname, char __user *optval,
int __user *option);
void (*keepalive)(struct sock *sk, int valbool);
#ifdef CONFIG_COMPAT
int (*compat_setsockopt)(struct sock *sk,
int level,
Expand Down Expand Up @@ -1093,6 +1096,7 @@ struct proto {
struct inet_hashinfo *hashinfo;
struct udp_table *udp_table;
struct raw_hashinfo *raw_hash;
struct smc_hashinfo *smc_hash;
} h;

struct module *owner;
Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/netlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#define NETLINK_ECRYPTFS 19
#define NETLINK_RDMA 20
#define NETLINK_CRYPTO 21 /* Crypto layer */
#define NETLINK_SMC 22 /* SMC monitoring */

#define NETLINK_INET_DIAG NETLINK_SOCK_DIAG

Expand Down
35 changes: 35 additions & 0 deletions include/uapi/linux/smc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Shared Memory Communications over RDMA (SMC-R) and RoCE
*
* Definitions for generic netlink based configuration of an SMC-R PNET table
*
* Copyright IBM Corp. 2016
*
* Author(s): Thomas Richter <tmricht@linux.vnet.ibm.com>
*/

#ifndef _UAPI_LINUX_SMC_H_
#define _UAPI_LINUX_SMC_H_

/* Netlink SMC_PNETID attributes */
enum {
SMC_PNETID_UNSPEC,
SMC_PNETID_NAME,
SMC_PNETID_ETHNAME,
SMC_PNETID_IBNAME,
SMC_PNETID_IBPORT,
__SMC_PNETID_MAX,
SMC_PNETID_MAX = __SMC_PNETID_MAX - 1
};

enum { /* SMC PNET Table commands */
SMC_PNETID_GET = 1,
SMC_PNETID_ADD,
SMC_PNETID_DEL,
SMC_PNETID_FLUSH
};

#define SMCR_GENL_FAMILY_NAME "SMC_PNETID"
#define SMCR_GENL_FAMILY_VERSION 1

#endif /* _UAPI_LINUX_SMC_H */
85 changes: 85 additions & 0 deletions include/uapi/linux/smc_diag.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#ifndef _UAPI_SMC_DIAG_H_
#define _UAPI_SMC_DIAG_H_

#include <linux/types.h>
#include <linux/inet_diag.h>
#include <rdma/ib_verbs.h>

/* Request structure */
struct smc_diag_req {
__u8 diag_family;
__u8 pad[2];
__u8 diag_ext; /* Query extended information */
struct inet_diag_sockid id;
};

/* Base info structure. It contains socket identity (addrs/ports/cookie) based
* on the internal clcsock, and more SMC-related socket data
*/
struct smc_diag_msg {
__u8 diag_family;
__u8 diag_state;
__u8 diag_fallback;
__u8 diag_shutdown;
struct inet_diag_sockid id;

__u32 diag_uid;
__u64 diag_inode;
};

/* Extensions */

enum {
SMC_DIAG_NONE,
SMC_DIAG_CONNINFO,
SMC_DIAG_LGRINFO,
SMC_DIAG_SHUTDOWN,
__SMC_DIAG_MAX,
};

#define SMC_DIAG_MAX (__SMC_DIAG_MAX - 1)

/* SMC_DIAG_CONNINFO */

struct smc_diag_cursor {
__u16 reserved;
__u16 wrap;
__u32 count;
};

struct smc_diag_conninfo {
__u32 token; /* unique connection id */
__u32 sndbuf_size; /* size of send buffer */
__u32 rmbe_size; /* size of RMB element */
__u32 peer_rmbe_size; /* size of peer RMB element */
/* local RMB element cursors */
struct smc_diag_cursor rx_prod; /* received producer cursor */
struct smc_diag_cursor rx_cons; /* received consumer cursor */
/* peer RMB element cursors */
struct smc_diag_cursor tx_prod; /* sent producer cursor */
struct smc_diag_cursor tx_cons; /* sent consumer cursor */
__u8 rx_prod_flags; /* received producer flags */
__u8 rx_conn_state_flags; /* recvd connection flags*/
__u8 tx_prod_flags; /* sent producer flags */
__u8 tx_conn_state_flags; /* sent connection flags*/
/* send buffer cursors */
struct smc_diag_cursor tx_prep; /* prepared to be sent cursor */
struct smc_diag_cursor tx_sent; /* sent cursor */
struct smc_diag_cursor tx_fin; /* confirmed sent cursor */
};

/* SMC_DIAG_LINKINFO */

struct smc_diag_linkinfo {
__u8 link_id; /* link identifier */
__u8 ibname[IB_DEVICE_NAME_MAX]; /* name of the RDMA device */
__u8 ibport; /* RDMA device port number */
__u8 gid[40]; /* local GID */
__u8 peer_gid[40]; /* peer GID */
};

struct smc_diag_lgrinfo {
struct smc_diag_linkinfo lnk[1];
__u8 role;
};
#endif /* _UAPI_SMC_DIAG_H_ */
1 change: 1 addition & 0 deletions net/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ source "net/packet/Kconfig"
source "net/unix/Kconfig"
source "net/xfrm/Kconfig"
source "net/iucv/Kconfig"
source "net/smc/Kconfig"

config INET
bool "TCP/IP networking"
Expand Down
1 change: 1 addition & 0 deletions net/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ obj-$(CONFIG_MAC80211) += mac80211/
obj-$(CONFIG_TIPC) += tipc/
obj-$(CONFIG_NETLABEL) += netlabel/
obj-$(CONFIG_IUCV) += iucv/
obj-$(CONFIG_SMC) += smc/
obj-$(CONFIG_RFKILL) += rfkill/
obj-$(CONFIG_NET_9P) += 9p/
obj-$(CONFIG_CAIF) += caif/
Expand Down
13 changes: 5 additions & 8 deletions net/core/sock.c
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
"sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" ,
"sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" ,
"sk_lock-AF_NFC" , "sk_lock-AF_VSOCK" , "sk_lock-AF_KCM" ,
"sk_lock-AF_MAX"
"sk_lock-AF_SMC" , "sk_lock-AF_MAX"
};
static const char *const af_family_slock_key_strings[AF_MAX+1] = {
"slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
Expand All @@ -239,7 +239,7 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = {
"slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" ,
"slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" ,
"slock-AF_NFC" , "slock-AF_VSOCK" ,"slock-AF_KCM" ,
"slock-AF_MAX"
"slock-AF_SMC" , "slock-AF_MAX"
};
static const char *const af_family_clock_key_strings[AF_MAX+1] = {
"clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
Expand All @@ -256,7 +256,7 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = {
"clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" ,
"clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" ,
"clock-AF_NFC" , "clock-AF_VSOCK" , "clock-AF_KCM" ,
"clock-AF_MAX"
"closck-AF_smc" , "clock-AF_MAX"
};

/*
Expand Down Expand Up @@ -762,11 +762,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
goto set_rcvbuf;

case SO_KEEPALIVE:
#ifdef CONFIG_INET
if (sk->sk_protocol == IPPROTO_TCP &&
sk->sk_type == SOCK_STREAM)
tcp_set_keepalive(sk, valbool);
#endif
if (sk->sk_prot->keepalive)
sk->sk_prot->keepalive(sk, valbool);
sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool);
break;

Expand Down
1 change: 1 addition & 0 deletions net/ipv4/tcp_ipv4.c
Original file line number Diff line number Diff line change
Expand Up @@ -2376,6 +2376,7 @@ struct proto tcp_prot = {
.shutdown = tcp_shutdown,
.setsockopt = tcp_setsockopt,
.getsockopt = tcp_getsockopt,
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
.sendpage = tcp_sendpage,
Expand Down
1 change: 1 addition & 0 deletions net/ipv4/tcp_timer.c
Original file line number Diff line number Diff line change
Expand Up @@ -617,6 +617,7 @@ void tcp_set_keepalive(struct sock *sk, int val)
else if (!val)
inet_csk_delete_keepalive_timer(sk);
}
EXPORT_SYMBOL_GPL(tcp_set_keepalive);


static void tcp_keepalive_timer (unsigned long data)
Expand Down
1 change: 1 addition & 0 deletions net/ipv6/tcp_ipv6.c
Original file line number Diff line number Diff line change
Expand Up @@ -1889,6 +1889,7 @@ struct proto tcpv6_prot = {
.shutdown = tcp_shutdown,
.setsockopt = tcp_setsockopt,
.getsockopt = tcp_getsockopt,
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
.sendpage = tcp_sendpage,
Expand Down
20 changes: 20 additions & 0 deletions net/smc/Kconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
config SMC
tristate "SMC socket protocol family"
depends on INET && INFINIBAND
---help---
SMC-R provides a "sockets over RDMA" solution making use of
RDMA over Converged Ethernet (RoCE) technology to upgrade
AF_INET TCP connections transparently.
The Linux implementation of the SMC-R solution is designed as
a separate socket family SMC.

Select this option if you want to run SMC socket applications

config SMC_DIAG
tristate "SMC: socket monitoring interface"
depends on SMC
---help---
Support for SMC socket monitoring interface used by tools such as
smcss.

if unsure, say Y.
4 changes: 4 additions & 0 deletions net/smc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
obj-$(CONFIG_SMC) += smc.o
obj-$(CONFIG_SMC_DIAG) += smc_diag.o
smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o
smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o
Loading

0 comments on commit f3a3e24

Please sign in to comment.