Why does Encoding.UTF8.GetMaxByteCount(1) return 6?

The TLDR here is simple: What's a sequence of chars that would make either UTF8's Encoding or Encoder return 6 (or even 5) bytes for a single char, as GetMaxByteCount implies it might?

The non-TLDR:

Despite what the docs led me to expect, there is no sign that the UTF8 Encod-ing either considers potential leftover surrogates from a previous decoder operation, or includes the worst case for the currently selected EncoderFallback. Note that the UTF8 Encod-er does support cached bytes, but UTF8's Encod-ing apparently does not.

And while the Encod-er will return as many as 4 bytes from the submission of a single character, I've never been able to get more than 3 from the Encod-ing. And yet GetMaxByteCount is telling me there can sometimes be 6?

Is there some trick here? Maybe some case where a malformed set of characters might return longer-than-expected sequences? I'm looking for some specific examples.

Here's some code you can use to experiment:

string smilelyface = "😄"; // <--- 2 chars, encodes to 4 UTF8 bytesEncoding enc = Encoding.UTF8;int mbc = enc.GetMaxByteCount(1);Console.WriteLine("mbc: {0}", mbc); // <---- 6byte[] sixbytes = new byte[mbc];int gb = enc.GetBytes(smilelyface.AsSpan(0, 1), sixbytes); // Encode 1st charConsole.WriteLine("retval: {0}", gb); // <----- 3gb = enc.GetBytes(smilelyface.AsSpan(1, 1), sixbytes); // Encode 2nd charConsole.WriteLine("retval: {0}", gb); // <----- 3bool b = enc.TryGetBytes(smilelyface.AsSpan(0, 1), sixbytes, out int outbyteswritten);Console.WriteLine("outbyteswritten: {0}", outbyteswritten); // <----- 3b = enc.TryGetBytes(smilelyface.AsSpan(1, 1), sixbytes, out outbyteswritten);Console.WriteLine("outbyteswritten: {0}", outbyteswritten); // <----- 3Encoder encr = enc.GetEncoder();encr.Convert(smilelyface.AsSpan(0, 1), sixbytes, false, out int charsused, out int bytesused, out bool completed);Console.WriteLine("BytesUsed: {0}", bytesused); // <----- 0encr.Convert(smilelyface.AsSpan(1, 1), sixbytes, false, out charsused, out bytesused, out completed);Console.WriteLine("BytesUsed: {0}", bytesused); // <----- 4

You'll note that the Encod-ing never returns more than 3 bytes, and in this case those 3 are the unicode 'replacement' character, suggesting the UTF8 Encod-ing has no intention of caching anything for use by subsequent encoding calls. Hard to see how you can get more than 3 bytes that way.

And while the Encod-er does cache, I still can't see how to get it to output 6 bytes, just 4.

I get that GetMaxByteCount is supposed to be 'worst case,' but AFAICT, worst case here is either 3 or 4 depending on whether we're talking about the Encoding, or the Encoder (the docs are unclear about which. Both?).

Can you really get 6 from encoding 1 char? How?

Why does Encoding.UTF8.GetMaxByteCount(1) return 6?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112