Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Casting each char from span vs MemoryMarshal.Cast

$
0
0

When debugging an operation with UTF8 strings, sometimes I want to see the string representation from a given ReadOnlySpan<byte> so i created a static function to help me achieve it, but, one of the ways to do so doesn't worked as spected, i wonder why does the outcoming string is incomprehensible.

//#define FORCE_NOT_UTF8using MemoryMarshal = System.Runtime.InteropServices.MemoryMarshal;using Unsafe = System.Runtime.CompilerServices.Unsafe;using Encoding = System.Text.Encoding;static string ForgeString(ReadOnlySpan<byte> utf8Runes){    Span<char> buffer = utf8Runes.Length > 1024        ? new char[utf8Runes.Length]        : stackalloc char[1024]    ;#if FORCE_NOT_UTF8    Encoding.UTF8.GetChars(utf8Runes, buffer);#else    if (Encoding.Default.BodyName != Encoding.UTF8.BodyName)    {        Encoding.UTF8.GetChars(utf8Runes, buffer);    }    else if(buffer.Length is <= 1024)    {        MemoryMarshal.Cast<byte, char>(utf8Runes).CopyTo(buffer);    }    else    {        ref readonly var elmnt0 = ref utf8Runes[0];        ref var ptrSrc = ref Unsafe.AsRef(in elmnt0);        ref var ptrDst = ref buffer[0];        for(int i = 0; ptrSrc is not default(byte) && i < utf8Runes.Length; i++)        {            ptrDst = (char) ptrSrc;            ptrSrc = ref Unsafe.Add(ref ptrSrc, 1);            ptrDst = ref Unsafe.Add(ref ptrDst, 1);        }    }#endif    Index end = buffer.IndexOf(default(char)) is int index and not -1 ? new(index) : Index.End;    return new(buffer[..end]);}string result1 = default!;string result2 = default!;result1 = ForgeString("foobar"u8);result2 = ForgeString("james james james (...repeating 166 times)"u8);Console.WriteLine(result1);Console.WriteLine(result2);//in order to get string result3 its necessary to recompile with compiler symbol FORCE_NOT_UTF8

The for loop prints normally, 'James' a bunch of times but, using marshal casting, 'foobar' produces '潦扯牡.'What's happing behind Cast<TFrom,TTo> to create this unexpected sequence? I thought the idea of it was literally (T)eing each element of a given span.


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>