Unicode Normalisation Vulnerability
Characters can often be represented in multiple ways in Unicode. This can lead to security vulnerabilities if systems do not properly normalize input before processing it
Normalization ensures two strings that may use a different binary representation for their characters have the same binary value after normalization
Case Study
Spotify account hijacking:
- Target user account is bigbird
- Create new account
ᴮᴵᴳᴮᴵᴿᴰ, Python stringu'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30' - Send a request for a password reset for your new account
- A password reset link is sent to the email you registered for your new account. Use it to change the password
- Now, instead of logging in to account with username
ᴮᴵᴳᴮᴵᴿᴰ, try logging in to account with username bigbird with the new password - Success! Mission accomplished
Spotify followed simple rules for usernames:
- Not allow white space in usernames
- Treat
BigBirdandbigbirdas the same username
Creative usernames and Spotify account hijacking
JavaScript
ECMAScript-5 has problems with characters present in astral planes
javascript
'mañana' == 'mañana' // false
'ma\xF1ana' == 'man\u0303ana' // false
'ma\xF1ana'.length // 6
'man\u0303ana'.length // 7
// match foo + one character + bar
/foo.bar/.test('foo💩bar') // false
/foo.bar/.test('foogbar') // true
// doesn't match line breaks, either
/^.$/.test('💩') // false
// matches line breaks, but
// still doesn't match whole astral symbols
/^[\s\S]$/.test('💩') // false
/^[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF]$/.test('💩')
// trueMySQL vs. UTF-8
sql
CREATE TABLE `table_name` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`column_name` VARCHAR(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;- Try adding Unicode character
sql
UPDATE table_name SET column_name = 'foo💩bar' WHERE id = 9001;
-- Query OK, 1 row affected, 1 warning
SHOW WARNINGS;
--
-- | Level | Code | Message |
-- |---------|------|-------------------------------------------------------------------------------|
-- | Warning | 1366 | Incorrect string value : '\xF0\x9F\x92\xA9' for column 'column_name' at row 1 |
SELECT column_name FROM table_name WHERE id = 9001;
-- fooMitigation
- Use the new UTF-8 implementation in MySQL which is
utf8mb4 mb4: Multi-Byte-4
sql
ALTER TABLE `table_name`
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;C-Sharp
Case mapping:
cs
// Case mapping use these
Regex.IsMatch(input, regex, RegexOptions.IgnoreCase);
Regex.IsMatch("hac\u212A", "HACK", RegexOptions.IgnoreCase) == true
string.Equals(input1, input2, StringComparison.CurrentCultureIgnoreCase);
new CultureInfo(..).CompareInfo.Compare(input1, input2, CompareOptions.IgnoreNonSpace)
// instead of
strings.ToLower(input);
strings.ToUpper(input);Normalization:
cs
input.Normalize(NormalizationForm.FormC);
input.Normalize(NormalizationForm.FormKC);Others:
cs
IdnMapping().GetAscii(input)
// Use these of URL
Uri(input).Host /
Uri(input).IdnHost /
Uri(input).SafeDnsHost
// Example
new Uri("http://faceboo\u212A.com").Host == "facebook.com"Python
Case mapping:
python
input.lower()
input.upper()
re.compile(regex, re.IGNORECASE).match(input)
input.casefold()Normalization:
python
unicodedata.normalize('NFC' input)
unicodedata.normalize('NFKC', input)Others:
python
urllib.parse.urlparse(input).hostname
// Example
import urllib.parse
urllib.parse.urlparse("http://i\u006bea.com").hostname ==
urllib.parse.urlparse("http://ikea.com").hostnameDomain Names
- IDN Domains (Internationalized Domain Name)
- 2009
- Stored as ASCII strings using Punycode transcription
- No changes to the DNS system needed
- Check out notes about URL here: URL Notes Link
RIGHT-TO-LEFT OVERRIDE
Unicode Character 'RIGHT-TO-LEFT OVERRIDE' (U+202E)
ruby
ruby -e 'File.rename("backdoor_ppt.exe", "resume\xe2\x80\xaetpp.exe")'