Skip to content

Commit

Permalink
tcp/dccp: change source port selection at connect() time
Browse files Browse the repository at this point in the history
In commit 1580ab6 ("tcp/dccp: better use of ephemeral ports in connect()")
we added an heuristic to select even ports for connect() and odd ports for bind().

This was nice because no applications changes were needed.

But it added more costs when all even ports are in use,
when there are few listeners and many active connections.

Since then, IP_LOCAL_PORT_RANGE has been added to permit an application
to partition ephemeral port range at will.

This patch extends the idea so that if IP_LOCAL_PORT_RANGE is set on
a socket before accept(), port selection no longer favors even ports.

This means that connect() can find a suitable source port faster,
and applications can use a different split between connect() and bind()
users.

This should give more entropy to Toeplitz hash used in RSS: Using even
ports was wasting one bit from the 16bit sport.

A similar change can be done in inet_csk_find_open_port() if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20231214192939.1962891-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Eric Dumazet authored and Jakub Kicinski committed Dec 16, 2023
1 parent 41db762 commit 2071848
Showing 1 changed file with 16 additions and 11 deletions.
27 changes: 16 additions & 11 deletions net/ipv4/inet_hashtables.c
Original file line number Diff line number Diff line change
Expand Up @@ -1012,7 +1012,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
bool tb_created = false;
u32 remaining, offset;
int ret, i, low, high;
int l3mdev;
bool local_ports;
int step, l3mdev;
u32 index;

if (port) {
Expand All @@ -1024,10 +1025,12 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,

l3mdev = inet_sk_bound_l3mdev(sk);

inet_sk_get_local_port_range(sk, &low, &high);
local_ports = inet_sk_get_local_port_range(sk, &low, &high);
step = local_ports ? 1 : 2;

high++; /* [32768, 60999] -> [32768, 61000[ */
remaining = high - low;
if (likely(remaining > 1))
if (!local_ports && remaining > 1)
remaining &= ~1U;

get_random_sleepable_once(table_perturb,
Expand All @@ -1040,10 +1043,11 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
/* In first pass we try ports of @low parity.
* inet_csk_get_port() does the opposite choice.
*/
offset &= ~1U;
if (!local_ports)
offset &= ~1U;
other_parity_scan:
port = low + offset;
for (i = 0; i < remaining; i += 2, port += 2) {
for (i = 0; i < remaining; i += step, port += step) {
if (unlikely(port >= high))
port -= remaining;
if (inet_is_local_reserved_port(net, port))
Expand Down Expand Up @@ -1083,10 +1087,11 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
cond_resched();
}

offset++;
if ((offset & 1) && remaining > 1)
goto other_parity_scan;

if (!local_ports) {
offset++;
if ((offset & 1) && remaining > 1)
goto other_parity_scan;
}
return -EADDRNOTAVAIL;

ok:
Expand All @@ -1109,8 +1114,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
* on low contention the randomness is maximal and on high contention
* it may be inexistent.
*/
i = max_t(int, i, get_random_u32_below(8) * 2);
WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
i = max_t(int, i, get_random_u32_below(8) * step);
WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + step);

/* Head lock still held and bh's disabled */
inet_bind_hash(sk, tb, tb2, port);
Expand Down

0 comments on commit 2071848

Please sign in to comment.