A upstream service reads a stream of UTF-8 bytes, assumes they are ISO-8859-1, applies ISO-8859-1 to UTF-8 encoding, and sends them to my service, labeled as UTF-8.
The upstream service is out of my control. They may fix it, it may never be fixed.
I know that I can fix the encoding by applying UTF-8 to ISO-8859-1 encoding then labeling the bytes as UTF-8. But what happens if my upstream fixes their issue?
Is there any way to detect this issue and fix the encoding only when I find a bad encoding?
I'm also not sure that the upstream encoding is ISO-8859-1. I think the upstream is perl so that encoding makes sense and each sample I've tried decoded correctly when I apply ISO-8859-1 encoding.
When the source sends e4 9c 94
(✔) to my upstream, my upstream sends me c3 a2 c2 9c c2 94
(â).
- utf-8 string
✔
as bytes:e4 9c 94
- bytes
e4 9c 94
as latin1 string:â
- utf-8 string
â
as bytes: c3 a2 c2 9c c2 94
I can fix it applying upstream.encode('ISO-8859-1').force_encoding('UTF-8')
but it will break as soon as the upstream issue is fixed.