Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1045

Convert universal character name to UTF-8 in C

$
0
0

I need to convert universal character name (UCN) data from a database to UTF-8. Seems trivial, but I spent hours reading about unicode, UTF-8, wide strings, ... without any result.

As example, the following string needs to be converted from D\u00c3\u00bcsseldorf to Düsseldorf.

What I tried:

char str[] = "\u00c3\u00bc"; // corresponds to üsize_t str_len = strlen(str);for (i = 0; i < str_len; i++)    printf("%02hhx ", str[i]);printf("- %zu - %s\n", str_len, str); // prints "c3 83 c2 bc - 4 - ü"

c3 is correct, but the next 3 bytes are unexpected.
The compiler only considers the first part of the UCN (\u00c3).

wchar_t wcs[] = L"\u00c3\u00bc";size_t wcs_len = wcslen(wcs);for (i = 0; i < wcs_len; i++)    printf("%02hhx ", wcs[i]);printf("- %zu - %ls\n", wcs_len, wcs); // prints "c3 bc - 2 - ü"

Looks better.
The entire UCN is considered (c3 bc), but still no ü.

char str[] = "\xc3\xbc";size_t str_len = strlen(str);for (i = 0; i < str_len; i++)    printf("%02hhx ", str[i]);printf("- %zu %s\n", str_len, str); // prints "c3 bc - 2 ü"

This prints the ü, but I modified str from UCN to hex code.

What am I missing to get from \u00c3\u00bc to ü?


Viewing all articles
Browse latest Browse all 1045

Trending Articles