UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits

The UTF-8 string in example seems to be coded with too many bytes!

The input string: "👉TEST📍TEST"

“👉” (U+1F449): A hand pointing right
“T”, “E”, “S”, “T”: Basic Latin letters
“📍” (U+1F4CD): A round pushpin
“T”, “E”, “S”, “T”: Basic Latin letters

This string is stored in a UTF-8 encoded file, when I use a hexadecimal editor I see the 16 bytes below as expected. When I copy the strings into Online tools, I find the same 16 bytes.

f0 9f 91 89 54 45 53 54 f0 9f 93 8d 54 45 53 54 \_______/   \_______/   \_______/   \_______/  U+1F449    T  E  S  T   U+1F4CD    T  E  S  T“👉”“📍”

However, the results of the function babel:string-to-octets are different, I get 20 bytes:

(defun print-hex (octets)  (dotimes (offset (length octets))    (let ((byte (aref octets offset)))      (format t "~2,'0x " byte)))  (format t "(~A bytes)~%" (length octets)))(let ((string "👉TEST📍TEST"))  (format t "TEST STRING [~A]~%" string)  (print-hex (babel:string-to-octets string))  (print-hex (babel:string-to-octets string :encoding :UTF-8)))TEST STRING [👉TEST📍TEST]ED A0 BD ED B1 89 54 45 53 54 ED A0 BD ED B3 8D 54 45 53 54 (20 bytes)ED A0 BD ED B1 89 54 45 53 54 ED A0 BD ED B3 8D 54 45 53 54 (20 bytes)

If we analyze this further:

ED A0 BD ED B1 89 54 45 53 54 ED A0 BD ED B3 8D 54 45 53 54 \_____________/   \_______/   \_____________/   \_______/       ???         T  E  S  T       ???          T  E  S  T        ^^^                          ^^^UTF-16 surrogate pair?       UTF-16 surrogate pair?

How do I get the 16 bytes from the input string?

Another interesting behavior which highlight the same issue, converting to octets and then back to the original string leads to an encoding error on the first character.

(let ((string "👉TEST📍TEST"))  (babel:octets-to-string (babel:string-to-octets string)))debugger invoked on a BABEL-ENCODINGS:CHARACTER-OUT-OF-RANGE in thread#<THREAD "main thread" RUNNING {100F080003}>:  Illegal :UTF-8 character starting at position 0.Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits

Trending Articles

SHA FM SINDU KAMARE WITH EMBILIPITIYA DELIGHTED 2018-06-22

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Former Waltham man, 30, jailed for eight-and-a-half years for raping four women

Download: Rich Bizzy ft Black Dido & Ken Dee- Mary ” Prod By Ken DEE”

Renolink 1.99 China without error "Padding is invalid..."

Neem Baba Extra Questions Answer Class 6 English Poorvi

Foreigner found dead in Kg Sungai Teraban area

Gemvision Matrix 9.0 7349 Full crack + Rhinoceros 5.14 + Clayoo 2.5.18071.9

Mother's 'hell' at hands of online stalker Robert Jeffery from...

Windows Update / Microsoft Update の接続先 URL について

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Practice Sheet of Right form of verbs for HSC Students

Solved CBSE Sample Papers for Class 9 English Set 1

Drama series, Shaka Ilembe release date set for 2023

Yes – Yesshows (1980/2013) [HDTracks FLAC 24bit/192kHz]

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Muloraki Au

The 6 Best Sex Scenes in Nollywood Movies

Bureau of Internal Revenue: Regional Offices (Directory)

How to retrive an eigenvector connected to its eigenvalue