I have a messy CSV file where one of the city columns has half bytes and half utf-8 and includes double quotes with b'.
Example Row: column1,"b'\xc5\x81\xc3\xb3d\xc5\xba, Poland'", column3
Since it is already a string, I am unable to use .decode('utf-8') and it makes me encode to bytes again ...which creates a double encoding. While by itself:
b'\xc5\x81\xc3\xb3d\xc5\xba, Poland'.decode('utf-8')
works in Jupyter notebook to get the correct result:
'Łódź, Poland'
When trying with:
column3.encode('utf-8').decode('utf-8')
the result is:
"b'Å\x81ódź, Poland'"
How can I correctly decode this half bytes / half utf-8 string? Splitting and replacing the b' and " doesn't seem to work.