Skip to content

Commit

Permalink
lib/crc: simplify the kconfig options for CRC implementations
Browse files Browse the repository at this point in the history
Make the following simplifications to the kconfig options for choosing
CRC implementations for CRC32 and CRC_T10DIF:

1. Make the option to disable the arch-optimized code be visible only
   when CONFIG_EXPERT=y.
2. Make a single option control the inclusion of the arch-optimized code
   for all enabled CRC variants.
3. Make CRC32_SARWATE (a.k.a. slice-by-1 or byte-by-byte) be the only
   generic CRC32 implementation.

The result is there is now just one option, CRC_OPTIMIZATIONS, which is
default y and can be disabled only when CONFIG_EXPERT=y.

Rationale:

1. Enabling the arch-optimized code is nearly always the right choice.
   However, people trying to build the tiniest kernel possible would
   find some use in disabling it.  Anything we add to CRC32 is de facto
   unconditional, given that CRC32 gets selected by something in nearly
   all kernels.  And unfortunately enabling the arch CRC code does not
   eliminate the need to build the generic CRC code into the kernel too,
   due to CPU feature dependencies.  The size of the arch CRC code will
   also increase slightly over time as more CRC variants get added and
   more implementations targeting different instruction set extensions
   get added.  Thus, it seems worthwhile to still provide an option to
   disable it, but it should be considered an expert-level tweak.

2. Considering the use case described in (1), there doesn't seem to be
   sufficient value in making the arch-optimized CRC code be
   independently configurable for different CRC variants.  Note also
   that multiple variants were already grouped together, e.g.
   CONFIG_CRC32 actually enables three different variants of CRC32.

3. The bit-by-bit implementation is uselessly slow, whereas slice-by-n
   for n=4 and n=8 use tables that are inconveniently large: 4096 bytes
   and 8192 bytes respectively, compared to 1024 bytes for n=1.  Higher
   n gives higher instruction-level parallelism, so higher n easily wins
   on traditional microbenchmarks on most CPUs.  However, the larger
   tables, which are accessed randomly, can be harmful in real-world
   situations where the dcache may be cold or useful data may need be
   evicted from the dcache.  Meanwhile, today most architectures have
   much faster CRC32 implementations using dedicated CRC32 instructions
   or carryless multiplication instructions anyway, which make the
   generic code obsolete in most cases especially on long messages.

   Another reason for going with n=1 is that this is already what is
   used by all the other CRC variants in the kernel.  CRC32 was unique
   in having support for larger tables.  But as per the above this can
   be considered an outdated optimization.

   The standardization on slice-by-1 a.k.a. CRC32_SARWATE makes much of
   the code in lib/crc32.c unused.  A later patch will clean that up.

Link: https://lore.kernel.org/r/20250123212904.118683-2-ebiggers@kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
  • Loading branch information
Eric Biggers committed Jan 29, 2025
1 parent d0d106a commit b0430f3
Showing 1 changed file with 14 additions and 102 deletions.
116 changes: 14 additions & 102 deletions lib/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -164,34 +164,9 @@ config CRC_T10DIF
config ARCH_HAS_CRC_T10DIF
bool

choice
prompt "CRC-T10DIF implementation"
depends on CRC_T10DIF
default CRC_T10DIF_IMPL_ARCH if ARCH_HAS_CRC_T10DIF
default CRC_T10DIF_IMPL_GENERIC if !ARCH_HAS_CRC_T10DIF
help
This option allows you to override the default choice of CRC-T10DIF
implementation.

config CRC_T10DIF_IMPL_ARCH
bool "Architecture-optimized" if ARCH_HAS_CRC_T10DIF
help
Use the optimized implementation of CRC-T10DIF for the selected
architecture. It is recommended to keep this enabled, as it can
greatly improve CRC-T10DIF performance.

config CRC_T10DIF_IMPL_GENERIC
bool "Generic implementation"
help
Use the generic table-based implementation of CRC-T10DIF. Selecting
this will reduce code size slightly but can greatly reduce CRC-T10DIF
performance.

endchoice

config CRC_T10DIF_ARCH
tristate
default CRC_T10DIF if CRC_T10DIF_IMPL_ARCH
default CRC_T10DIF if ARCH_HAS_CRC_T10DIF && CRC_OPTIMIZATIONS

config CRC64_ROCKSOFT
tristate "CRC calculation for the Rocksoft model CRC64"
Expand All @@ -214,6 +189,7 @@ config CRC32
tristate "CRC32/CRC32c functions"
default y
select BITREVERSE
select CRC32_SARWATE
help
This option is provided for the case where no in-kernel-tree
modules require CRC32/CRC32c functions, but a module built outside
Expand All @@ -223,87 +199,12 @@ config CRC32
config ARCH_HAS_CRC32
bool

choice
prompt "CRC32 implementation"
depends on CRC32
default CRC32_IMPL_ARCH_PLUS_SLICEBY8 if ARCH_HAS_CRC32
default CRC32_IMPL_SLICEBY8 if !ARCH_HAS_CRC32
help
This option allows you to override the default choice of CRC32
implementation. Choose the default unless you know that you need one
of the others.

config CRC32_IMPL_ARCH_PLUS_SLICEBY8
bool "Arch-optimized, with fallback to slice-by-8" if ARCH_HAS_CRC32
help
Use architecture-optimized implementation of CRC32. Fall back to
slice-by-8 in cases where the arch-optimized implementation cannot be
used, e.g. if the CPU lacks support for the needed instructions.

This is the default when an arch-optimized implementation exists.

config CRC32_IMPL_ARCH_PLUS_SLICEBY1
bool "Arch-optimized, with fallback to slice-by-1" if ARCH_HAS_CRC32
help
Use architecture-optimized implementation of CRC32, but fall back to
slice-by-1 instead of slice-by-8 in order to reduce the binary size.

config CRC32_IMPL_SLICEBY8
bool "Slice by 8 bytes"
help
Calculate checksum 8 bytes at a time with a clever slicing algorithm.
This is much slower than the architecture-optimized implementation of
CRC32 (if the selected arch has one), but it is portable and is the
fastest implementation when no arch-optimized implementation is
available. It uses an 8KiB lookup table. Most modern processors have
enough cache to hold this table without thrashing the cache.

config CRC32_IMPL_SLICEBY4
bool "Slice by 4 bytes"
help
Calculate checksum 4 bytes at a time with a clever slicing algorithm.
This is a bit slower than slice by 8, but has a smaller 4KiB lookup
table.

Only choose this option if you know what you are doing.

config CRC32_IMPL_SLICEBY1
bool "Slice by 1 byte (Sarwate's algorithm)"
help
Calculate checksum a byte at a time using Sarwate's algorithm. This
is not particularly fast, but has a small 1KiB lookup table.

Only choose this option if you know what you are doing.

config CRC32_IMPL_BIT
bool "Classic Algorithm (one bit at a time)"
help
Calculate checksum one bit at a time. This is VERY slow, but has
no lookup table. This is provided as a debugging option.

Only choose this option if you are debugging crc32.

endchoice

config CRC32_ARCH
tristate
default CRC32 if CRC32_IMPL_ARCH_PLUS_SLICEBY8 || CRC32_IMPL_ARCH_PLUS_SLICEBY1

config CRC32_SLICEBY8
bool
default y if CRC32_IMPL_SLICEBY8 || CRC32_IMPL_ARCH_PLUS_SLICEBY8

config CRC32_SLICEBY4
bool
default y if CRC32_IMPL_SLICEBY4
default CRC32 if ARCH_HAS_CRC32 && CRC_OPTIMIZATIONS

config CRC32_SARWATE
bool
default y if CRC32_IMPL_SLICEBY1 || CRC32_IMPL_ARCH_PLUS_SLICEBY1

config CRC32_BIT
bool
default y if CRC32_IMPL_BIT

config CRC64
tristate "CRC64 functions"
Expand Down Expand Up @@ -343,6 +244,17 @@ config CRC8
when they need to do cyclic redundancy check according CRC8
algorithm. Module will be called crc8.

config CRC_OPTIMIZATIONS
bool "Enable optimized CRC implementations" if EXPERT
default y
help
Disabling this option reduces code size slightly by disabling the
architecture-optimized implementations of any CRC variants that are
enabled. CRC checksumming performance may get much slower.

Keep this enabled unless you're really trying to minimize the size of
the kernel.

config XXHASH
tristate

Expand Down

0 comments on commit b0430f3

Please sign in to comment.