Skip to content

Unicode Normalisation Vulnerability

Characters can often be represented in multiple ways in Unicode. This can lead to security vulnerabilities if systems do not properly normalize input before processing it

Normalization ensures two strings that may use a different binary representation for their characters have the same binary value after normalization

Case Study

Spotify account hijacking:

  1. Target user account is bigbird
  2. Create new account ᴮᴵᴳᴮᴵᴿᴰ, Python string u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'
  3. Send a request for a password reset for your new account
  4. A password reset link is sent to the email you registered for your new account. Use it to change the password
  5. Now, instead of logging in to account with username ᴮᴵᴳᴮᴵᴿᴰ, try logging in to account with username bigbird with the new password
  6. Success! Mission accomplished

Spotify followed simple rules for usernames:

  • Not allow white space in usernames
  • Treat BigBird and bigbird as the same username

Creative usernames and Spotify account hijacking

JavaScript

ECMAScript-5 has problems with characters present in astral planes

javascript
'mañana' == 'mañana'          // false

'ma\xF1ana' == 'man\u0303ana' // false

'ma\xF1ana'.length            // 6

'man\u0303ana'.length         // 7


// match foo + one character + bar
/foo.bar/.test('foo💩bar')    // false
/foo.bar/.test('foogbar')     // true


// doesn't match line breaks, either
/^.$/.test('💩')              // false

// matches line breaks, but
// still doesn't match whole astral symbols
/^[\s\S]$/.test('💩')         // false


/^[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF]$/.test('💩')
// true

MySQL vs. UTF-8

sql
CREATE TABLE `table_name` (
    `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    `column_name` VARCHAR(255) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • Try adding Unicode character
sql
UPDATE table_name SET column_name = 'foo💩bar' WHERE id = 9001;
-- Query OK, 1 row affected, 1 warning

SHOW WARNINGS;
--
-- | Level   | Code | Message                                                                       |
-- |---------|------|-------------------------------------------------------------------------------|
-- | Warning | 1366 | Incorrect string value : '\xF0\x9F\x92\xA9' for column 'column_name' at row 1 |

SELECT column_name FROM table_name WHERE id = 9001;
-- foo

Mitigation

  • Use the new UTF-8 implementation in MySQL which is utf8mb4
  • mb4: Multi-Byte-4
sql
ALTER TABLE `table_name`
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

C-Sharp

Case mapping:

cs
// Case mapping use these
Regex.IsMatch(input, regex, RegexOptions.IgnoreCase);
Regex.IsMatch("hac\u212A", "HACK", RegexOptions.IgnoreCase) == true

string.Equals(input1, input2, StringComparison.CurrentCultureIgnoreCase);

new CultureInfo(..).CompareInfo.Compare(input1, input2, CompareOptions.IgnoreNonSpace)

// instead of
strings.ToLower(input);
strings.ToUpper(input);

Normalization:

cs
input.Normalize(NormalizationForm.FormC);

input.Normalize(NormalizationForm.FormKC);

Others:

cs
IdnMapping().GetAscii(input)

// Use these of URL
Uri(input).Host /
Uri(input).IdnHost /
Uri(input).SafeDnsHost

// Example
new Uri("http://faceboo\u212A.com").Host == "facebook.com"

Python

Case mapping:

python
input.lower()
input.upper()
re.compile(regex, re.IGNORECASE).match(input)
input.casefold()

Normalization:

python
unicodedata.normalize('NFC' input)

unicodedata.normalize('NFKC', input)

Others:

python
urllib.parse.urlparse(input).hostname

// Example
import urllib.parse

urllib.parse.urlparse("http://i\u006bea.com").hostname ==
urllib.parse.urlparse("http://ikea.com").hostname

Domain Names

  • IDN Domains (Internationalized Domain Name)
  • 2009
  • Stored as ASCII strings using Punycode transcription
  • No changes to the DNS system needed
  • Check out notes about URL here: URL Notes Link

RIGHT-TO-LEFT OVERRIDE

Unicode Character 'RIGHT-TO-LEFT OVERRIDE' (U+202E)

ruby
ruby -e 'File.rename("backdoor_ppt.exe", "resume\xe2\x80\xaetpp.exe")'

Reference