Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Unicode string not being read correctly

$
0
0

I'm reading an HTML file using Java and am having some trouble with a Unicode character. The problematic statement is:

<span class="xml-lang" lang="cmn-Hant" xml:lang="cmn-Hant">𦮼</span>

The character is𦮼 (f0 a6 ae bc)

Whereas I read inম¼ (e0 a6 ae c2 bc)

It's close but obviously wrong.

The file I'm reading is marked utf-8 (and I'm reading it in as utf-8) and has LOADS of other CJK strings that get read in perfectly.

I'm hoping someone can simply look at these strings and understand how the f0 -> e0 and the introduction of c2.

Any ideas?


Viewing all articles
Browse latest Browse all 1060

Trending Articles