Skip to content

Commit

Permalink
udf: Fix leak of UTF-16 surrogates into encoded strings
Browse files Browse the repository at this point in the history
OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # >= v4.6
Reported-by: Mingye Wang <arthur200126@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
  • Loading branch information
Jan Kara committed Apr 18, 2018
1 parent 0685693 commit 44f06ba
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions fs/udf/unicode.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@

#include "udf_sb.h"

#define SURROGATE_MASK 0xfffff800
#define SURROGATE_PAIR 0x0000d800

static int udf_uni2char_utf8(wchar_t uni,
unsigned char *out,
int boundlen)
Expand All @@ -37,6 +40,9 @@ static int udf_uni2char_utf8(wchar_t uni,
if (boundlen <= 0)
return -ENAMETOOLONG;

if ((uni & SURROGATE_MASK) == SURROGATE_PAIR)
return -EINVAL;

if (uni < 0x80) {
out[u_len++] = (unsigned char)uni;
} else if (uni < 0x800) {
Expand Down

0 comments on commit 44f06ba

Please sign in to comment.