If I copy and paste some UTF-8 text [eg. “Wands!”
] into a TMemo
, it displays as expected.
If I generate a string containing the 3 bytes (as characters) for '“'
(ie 0xE2, 0x80, 0x9C
) and use Memo1.Lines.Add(x)
, it displays as 'â'
(0xE2 in extended ASCII) which it has stored as 0xC3, 0xA2
(UTF-8). The other two bytes of the string are stored as 0xC2, 0x80
& 0xC2, 0x9C
.
How can I indicate that the string that I am adding already has UTF-8 multi-byte characters? And why is the string pasted into the Memo not treated the same way?
I am trying to process text extracted from ePub files. Originally the idea was to generate sort versions of text containing characters with diacritics by replacing them with the un-accented characters, but I ran into this problem of inconsistent displays.