How to handle this subtle ambiguity of UTF-8 BOM?

The UTF-8 encoding allows both having and not having a BOM at the beginning of the byte sequence. This seems to create a subtle ambiguity, because the BOM itself represents the Unicode character U+FEFF.

For example, what character string does the following UTF-8 byte sequence (in a hex format) represent?

EF, BB, BF, 42, 43, 44

It can represent the character string "BCD" (containing 3 characters), with the first 3 bytes (EF, BB, BF) regarded as the BOM sequence. This seems to be the usual interpretation.

However, it can also represent the character string "[U+FEFF]BCD" (containing 4 characters), with the first 3 bytes (EF, BB, BF) not regarded as the BOM sequence but regarded as an ordinary UTF-8 encoding sequence of the Unicode character U+FEFF.

So, how to handle this ambiguity? Does the UTF-8 encoding have the rule that, if the byte sequence EF, BB, BF is at the beginning of the whole byte sequence, it must be interpreted as a BOM sequence instead of an encoding sequence of the Unicode character U+FEFF? But if this is the case, then the UTF-8 encoding cannot encode some Unicode character strings, namely, any Unicode character string starting with the Unicode character U+FEFF.

Other Unicode encodings, for example, UTF-16, may also have similar problems.

How to handle this subtle ambiguity of UTF-8 BOM?

Trending Articles

SHA FM SINDU KAMARE WITH EMBILIPITIYA DELIGHTED 2018-06-22

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Yes – Yesshows (1980/2013) [HDTracks FLAC 24bit/192kHz]

Renolink 1.99 China without error "Padding is invalid..."

२०१६ मराठी कालनिर्णय दिनदर्शिका डाउनलोड

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Victims of Father Gannon defeated a cover-up, with help from Broken Rites

The 6 Best Sex Scenes in Nollywood Movies

Bureau of Internal Revenue: Regional Offices (Directory)

Windows Update / Microsoft Update の接続先 URL について

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

knife gang trio locked up for terror raids

Drama series, Shaka Ilembe release date set for 2023

Practice Sheet of Right form of verbs for HSC Students

Download: Born Bugah ft T Low – Nasela(Prod by Yo Maps)

Muloraki Au

Mother's 'hell' at hands of online stalker Robert Jeffery from...

Returning Brunei IMT-12 given hero’s welcome

How to retrive an eigenvector connected to its eigenvalue