Skip to content

Commit

Permalink
Merge tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux
Browse files Browse the repository at this point in the history
Pull block updates from Jens Axboe:

 - Fixes for integrity handling

 - NVMe pull request via Keith:
      - Secure concatenation for TCP transport (Hannes)
      - Multipath sysfs visibility (Nilay)
      - Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
      - Correct use of 64-bit BARs for pci-epf target (Niklas)
      - Socket fix for selinux when used in containers (Peijie)

 - MD pull request via Yu:
      - fix recovery can preempt resync (Li Nan)
      - fix md-bitmap IO limit (Su Yue)
      - fix raid10 discard with REQ_NOWAIT (Xiao Ni)
      - fix raid1 memory leak (Zheng Qixing)
      - fix mddev uaf (Yu Kuai)
      - fix raid1,raid10 IO flags (Yu Kuai)
      - some refactor and cleanup (Yu Kuai)

 - Series cleaning up and fixing bugs in the bad block handling code

 - Improve support for write failure simulation in null_blk

 - Various lock ordering fixes

 - Fixes for locking for debugfs attributes

 - Various ublk related fixes and improvements

 - Cleanups for blk-rq-qos wait handling

 - blk-throttle fixes

 - Fixes for loop dio and sync handling

 - Fixes and cleanups for the auto-PI code

 - Block side support for hardware encryption keys in blk-crypto

 - Various cleanups and fixes

* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
  nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
  nvme-tcp: fix selinux denied when calling sock_sendmsg
  nvmet: pci-epf: Always configure BAR0 as 64-bit
  nvmet: Remove duplicate uuid_copy
  nvme: zns: Simplify nvme_zone_parse_entry()
  nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
  nvmet-fc: Remove unused functions
  nvme-pci: remove stale comment
  nvme-fc: Utilise min3() to simplify queue count calculation
  nvme-multipath: Add visibility for queue-depth io-policy
  nvme-multipath: Add visibility for numa io-policy
  nvme-multipath: Add visibility for round-robin io-policy
  nvmet: add tls_concat and tls_key debugfs entries
  nvmet-tcp: support secure channel concatenation
  nvmet: Add 'sq' argument to alloc_ctrl_args
  nvme-fabrics: reset admin connection for secure concatenation
  nvme-tcp: request secure channel concatenation
  nvme-keyring: add nvme_tls_psk_refresh()
  nvme: add nvme_auth_derive_tls_psk()
  nvme: add nvme_auth_generate_digest()
  ...
  • Loading branch information
Linus Torvalds committed Mar 27, 2025
2 parents 91928e0 + 3c9f0c9 commit 9b960d8
Show file tree
Hide file tree
Showing 128 changed files with 4,059 additions and 1,561 deletions.
43 changes: 42 additions & 1 deletion Documentation/ABI/stable/sysfs-block
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,10 @@ Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Indicates whether a storage device is capable of storing
integrity metadata. Set if the device is T10 PI-capable.
This flag is set to 1 if the storage media is formatted
with T10 Protection Information. If the storage media is
not formatted with T10 Protection Information, this flag
is set to 0.


What: /sys/block/<disk>/integrity/format
Expand All @@ -117,6 +121,13 @@ Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Metadata format for integrity capable block device.
E.g. T10-DIF-TYPE1-CRC.
This field describes the type of T10 Protection Information
that the block device can send and receive.
If the device can store application integrity metadata but
no T10 Protection Information profile is used, this field
contains "nop".
If the device does not support integrity metadata, this
field contains "none".


What: /sys/block/<disk>/integrity/protection_interval_bytes
Expand All @@ -142,7 +153,17 @@ Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Number of bytes of integrity tag space available per
512 bytes of data.
protection_interval_bytes, which is typically
the device's logical block size.
This field describes the size of the application tag
if the storage device is formatted with T10 Protection
Information and permits use of the application tag.
The tag_size is reported in bytes and indicates the
space available for adding an opaque tag to each block
(protection_interval_bytes).
If the device does not support T10 Protection Information
(even if the device provides application integrity
metadata space), this field is set to 0.


What: /sys/block/<disk>/integrity/write_generate
Expand Down Expand Up @@ -229,6 +250,17 @@ Description:
encryption, refer to Documentation/block/inline-encryption.rst.


What: /sys/block/<disk>/queue/crypto/hw_wrapped_keys
Date: February 2025
Contact: linux-block@vger.kernel.org
Description:
[RO] The presence of this file indicates that the device
supports hardware-wrapped inline encryption keys, i.e. key blobs
that can only be unwrapped and used by dedicated hardware. For
more information about hardware-wrapped inline encryption keys,
see Documentation/block/inline-encryption.rst.


What: /sys/block/<disk>/queue/crypto/max_dun_bits
Date: February 2022
Contact: linux-block@vger.kernel.org
Expand Down Expand Up @@ -267,6 +299,15 @@ Description:
use with inline encryption.


What: /sys/block/<disk>/queue/crypto/raw_keys
Date: February 2025
Contact: linux-block@vger.kernel.org
Description:
[RO] The presence of this file indicates that the device
supports raw inline encryption keys, i.e. keys that are managed
in raw, plaintext form in software.


What: /sys/block/<disk>/queue/dax
Date: June 2016
Contact: linux-block@vger.kernel.org
Expand Down
255 changes: 251 additions & 4 deletions Documentation/block/inline-encryption.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ Basic design
============

We introduce ``struct blk_crypto_key`` to represent an inline encryption key and
how it will be used. This includes the actual bytes of the key; the size of the
key; the algorithm and data unit size the key will be used with; and the number
of bytes needed to represent the maximum data unit number the key will be used
with.
how it will be used. This includes the type of the key (raw or
hardware-wrapped); the actual bytes of the key; the size of the key; the
algorithm and data unit size the key will be used with; and the number of bytes
needed to represent the maximum data unit number the key will be used with.

We introduce ``struct bio_crypt_ctx`` to represent an encryption context. It
contains a data unit number and a pointer to a blk_crypto_key. We add pointers
Expand Down Expand Up @@ -301,3 +301,250 @@ kernel will pretend that the device does not support hardware inline encryption
When the crypto API fallback is enabled, this means that all bios with and
encryption context will use the fallback, and IO will complete as usual. When
the fallback is disabled, a bio with an encryption context will be failed.

.. _hardware_wrapped_keys:

Hardware-wrapped keys
=====================

Motivation and threat model
---------------------------

Linux storage encryption (dm-crypt, fscrypt, eCryptfs, etc.) traditionally
relies on the raw encryption key(s) being present in kernel memory so that the
encryption can be performed. This traditionally isn't seen as a problem because
the key(s) won't be present during an offline attack, which is the main type of
attack that storage encryption is intended to protect from.

However, there is an increasing desire to also protect users' data from other
types of attacks (to the extent possible), including:

- Cold boot attacks, where an attacker with physical access to a system suddenly
powers it off, then immediately dumps the system memory to extract recently
in-use encryption keys, then uses these keys to decrypt user data on-disk.

- Online attacks where the attacker is able to read kernel memory without fully
compromising the system, followed by an offline attack where any extracted
keys can be used to decrypt user data on-disk. An example of such an online
attack would be if the attacker is able to run some code on the system that
exploits a Meltdown-like vulnerability but is unable to escalate privileges.

- Online attacks where the attacker fully compromises the system, but their data
exfiltration is significantly time-limited and/or bandwidth-limited, so in
order to completely exfiltrate the data they need to extract the encryption
keys to use in a later offline attack.

Hardware-wrapped keys are a feature of inline encryption hardware that is
designed to protect users' data from the above attacks (to the extent possible),
without introducing limitations such as a maximum number of keys.

Note that it is impossible to **fully** protect users' data from these attacks.
Even in the attacks where the attacker "just" gets read access to kernel memory,
they can still extract any user data that is present in memory, including
plaintext pagecache pages of encrypted files. The focus here is just on
protecting the encryption keys, as those instantly give access to **all** user
data in any following offline attack, rather than just some of it (where which
data is included in that "some" might not be controlled by the attacker).

Solution overview
-----------------

Inline encryption hardware typically has "keyslots" into which software can
program keys for the hardware to use; the contents of keyslots typically can't
be read back by software. As such, the above security goals could be achieved
if the kernel simply erased its copy of the key(s) after programming them into
keyslot(s) and thereafter only referred to them via keyslot number.

However, that naive approach runs into a couple problems:

- It limits the number of unlocked keys to the number of keyslots, which
typically is a small number. In cases where there is only one encryption key
system-wide (e.g., a full-disk encryption key), that can be tolerable.
However, in general there can be many logged-in users with many different
keys, and/or many running applications with application-specific encrypted
storage areas. This is especially true if file-based encryption (e.g.
fscrypt) is being used.

- Inline crypto engines typically lose the contents of their keyslots if the
storage controller (usually UFS or eMMC) is reset. Resetting the storage
controller is a standard error recovery procedure that is executed if certain
types of storage errors occur, and such errors can occur at any time.
Therefore, when inline crypto is being used, the operating system must always
be ready to reprogram the keyslots without user intervention.

Thus, it is important for the kernel to still have a way to "remind" the
hardware about a key, without actually having the raw key itself.

Somewhat less importantly, it is also desirable that the raw keys are never
visible to software at all, even while being initially unlocked. This would
ensure that a read-only compromise of system memory will never allow a key to be
extracted to be used off-system, even if it occurs when a key is being unlocked.

To solve all these problems, some vendors of inline encryption hardware have
made their hardware support *hardware-wrapped keys*. Hardware-wrapped keys
are encrypted keys that can only be unwrapped (decrypted) and used by hardware
-- either by the inline encryption hardware itself, or by a dedicated hardware
block that can directly provision keys to the inline encryption hardware.

(We refer to them as "hardware-wrapped keys" rather than simply "wrapped keys"
to add some clarity in cases where there could be other types of wrapped keys,
such as in file-based encryption. Key wrapping is a commonly used technique.)

The key which wraps (encrypts) hardware-wrapped keys is a hardware-internal key
that is never exposed to software; it is either a persistent key (a "long-term
wrapping key") or a per-boot key (an "ephemeral wrapping key"). The long-term
wrapped form of the key is what is initially unlocked, but it is erased from
memory as soon as it is converted into an ephemerally-wrapped key. In-use
hardware-wrapped keys are always ephemerally-wrapped, not long-term wrapped.

As inline encryption hardware can only be used to encrypt/decrypt data on-disk,
the hardware also includes a level of indirection; it doesn't use the unwrapped
key directly for inline encryption, but rather derives both an inline encryption
key and a "software secret" from it. Software can use the "software secret" for
tasks that can't use the inline encryption hardware, such as filenames
encryption. The software secret is not protected from memory compromise.

Key hierarchy
-------------

Here is the key hierarchy for a hardware-wrapped key::

Hardware-wrapped key
|
|
<Hardware KDF>
|
-----------------------------
| |
Inline encryption key Software secret

The components are:

- *Hardware-wrapped key*: a key for the hardware's KDF (Key Derivation
Function), in ephemerally-wrapped form. The key wrapping algorithm is a
hardware implementation detail that doesn't impact kernel operation, but a
strong authenticated encryption algorithm such as AES-256-GCM is recommended.

- *Hardware KDF*: a KDF (Key Derivation Function) which the hardware uses to
derive subkeys after unwrapping the wrapped key. The hardware's choice of KDF
doesn't impact kernel operation, but it does need to be known for testing
purposes, and it's also assumed to have at least a 256-bit security strength.
All known hardware uses the SP800-108 KDF in Counter Mode with AES-256-CMAC,
with a particular choice of labels and contexts; new hardware should use this
already-vetted KDF.

- *Inline encryption key*: a derived key which the hardware directly provisions
to a keyslot of the inline encryption hardware, without exposing it to
software. In all known hardware, this will always be an AES-256-XTS key.
However, in principle other encryption algorithms could be supported too.
Hardware must derive distinct subkeys for each supported encryption algorithm.

- *Software secret*: a derived key which the hardware returns to software so
that software can use it for cryptographic tasks that can't use inline
encryption. This value is cryptographically isolated from the inline
encryption key, i.e. knowing one doesn't reveal the other. (The KDF ensures
this.) Currently, the software secret is always 32 bytes and thus is suitable
for cryptographic applications that require up to a 256-bit security strength.
Some use cases (e.g. full-disk encryption) won't require the software secret.

Example: in the case of fscrypt, the fscrypt master key (the key that protects a
particular set of encrypted directories) is made hardware-wrapped. The inline
encryption key is used as the file contents encryption key, while the software
secret (rather than the master key directly) is used to key fscrypt's KDF
(HKDF-SHA512) to derive other subkeys such as filenames encryption keys.

Note that currently this design assumes a single inline encryption key per
hardware-wrapped key, without any further key derivation. Thus, in the case of
fscrypt, currently hardware-wrapped keys are only compatible with the "inline
encryption optimized" settings, which use one file contents encryption key per
encryption policy rather than one per file. This design could be extended to
make the hardware derive per-file keys using per-file nonces passed down the
storage stack, and in fact some hardware already supports this; future work is
planned to remove this limitation by adding the corresponding kernel support.

Kernel support
--------------

The inline encryption support of the kernel's block layer ("blk-crypto") has
been extended to support hardware-wrapped keys as an alternative to raw keys,
when hardware support is available. This works in the following way:

- A ``key_types_supported`` field is added to the crypto capabilities in
``struct blk_crypto_profile``. This allows device drivers to declare that
they support raw keys, hardware-wrapped keys, or both.

- ``struct blk_crypto_key`` can now contain a hardware-wrapped key as an
alternative to a raw key; a ``key_type`` field is added to
``struct blk_crypto_config`` to distinguish between the different key types.
This allows users of blk-crypto to en/decrypt data using a hardware-wrapped
key in a way very similar to using a raw key.

- A new method ``blk_crypto_ll_ops::derive_sw_secret`` is added. Device drivers
that support hardware-wrapped keys must implement this method. Users of
blk-crypto can call ``blk_crypto_derive_sw_secret()`` to access this method.

- The programming and eviction of hardware-wrapped keys happens via
``blk_crypto_ll_ops::keyslot_program`` and
``blk_crypto_ll_ops::keyslot_evict``, just like it does for raw keys. If a
driver supports hardware-wrapped keys, then it must handle hardware-wrapped
keys being passed to these methods.

blk-crypto-fallback doesn't support hardware-wrapped keys. Therefore,
hardware-wrapped keys can only be used with actual inline encryption hardware.

All the above deals with hardware-wrapped keys in ephemerally-wrapped form only.
To get such keys in the first place, new block device ioctls have been added to
provide a generic interface to creating and preparing such keys:

- ``BLKCRYPTOIMPORTKEY`` converts a raw key to long-term wrapped form. It takes
in a pointer to a ``struct blk_crypto_import_key_arg``. The caller must set
``raw_key_ptr`` and ``raw_key_size`` to the pointer and size (in bytes) of the
raw key to import. On success, ``BLKCRYPTOIMPORTKEY`` returns 0 and writes
the resulting long-term wrapped key blob to the buffer pointed to by
``lt_key_ptr``, which is of maximum size ``lt_key_size``. It also updates
``lt_key_size`` to be the actual size of the key. On failure, it returns -1
and sets errno. An errno of ``EOPNOTSUPP`` indicates that the block device
does not support hardware-wrapped keys. An errno of ``EOVERFLOW`` indicates
that the output buffer did not have enough space for the key blob.

- ``BLKCRYPTOGENERATEKEY`` is like ``BLKCRYPTOIMPORTKEY``, but it has the
hardware generate the key instead of importing one. It takes in a pointer to
a ``struct blk_crypto_generate_key_arg``.

- ``BLKCRYPTOPREPAREKEY`` converts a key from long-term wrapped form to
ephemerally-wrapped form. It takes in a pointer to a ``struct
blk_crypto_prepare_key_arg``. The caller must set ``lt_key_ptr`` and
``lt_key_size`` to the pointer and size (in bytes) of the long-term wrapped
key blob to convert. On success, ``BLKCRYPTOPREPAREKEY`` returns 0 and writes
the resulting ephemerally-wrapped key blob to the buffer pointed to by
``eph_key_ptr``, which is of maximum size ``eph_key_size``. It also updates
``eph_key_size`` to be the actual size of the key. On failure, it returns -1
and sets errno. Errno values of ``EOPNOTSUPP`` and ``EOVERFLOW`` mean the
same as they do for ``BLKCRYPTOIMPORTKEY``. An errno of ``EBADMSG`` indicates
that the long-term wrapped key is invalid.

Userspace needs to use either ``BLKCRYPTOIMPORTKEY`` or ``BLKCRYPTOGENERATEKEY``
once to create a key, and then ``BLKCRYPTOPREPAREKEY`` each time the key is
unlocked and added to the kernel. Note that these ioctls have no relevance for
raw keys; they are only for hardware-wrapped keys.

Testability
-----------

Both the hardware KDF and the inline encryption itself are well-defined
algorithms that don't depend on any secrets other than the unwrapped key.
Therefore, if the unwrapped key is known to software, these algorithms can be
reproduced in software in order to verify the ciphertext that is written to disk
by the inline encryption hardware.

However, the unwrapped key will only be known to software for testing if the
"import" functionality is used. Proper testing is not possible in the
"generate" case where the hardware generates the key itself. The correct
operation of the "generate" mode thus relies on the security and correctness of
the hardware RNG and its use to generate the key, as well as the testing of the
"import" mode as that should cover all parts other than the key generation.

For an example of a test that verifies the ciphertext written to disk in the
"import" mode, see the fscrypt hardware-wrapped key tests in xfstests, or
`Android's vts_kernel_encryption_test
<https://android.googlesource.com/platform/test/vts-testcase/kernel/+/refs/heads/main/encryption/>`_.
2 changes: 2 additions & 0 deletions Documentation/userspace-api/ioctl/ioctl-number.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ Code Seq# Include File Comments
0x10 20-2F arch/s390/include/uapi/asm/hypfs.h
0x12 all linux/fs.h BLK* ioctls
linux/blkpg.h
linux/blkzoned.h
linux/blk-crypto.h
0x15 all linux/fs.h FS_IOC_* ioctls
0x1b all InfiniBand Subsystem
<http://infiniband.sourceforge.net/>
Expand Down
3 changes: 2 additions & 1 deletion block/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
obj-$(CONFIG_IOSCHED_BFQ) += bfq.o

obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o \
bio-integrity-auto.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
Expand Down
Loading

0 comments on commit 9b960d8

Please sign in to comment.