Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1047

Decode Mixed String of Bytes + UTF-8 in Python

$
0
0

I have a messy CSV file where one of the city columns has half bytes and half utf-8 and includes double quotes with b'.

Example Row:  column1,"b'\xc5\x81\xc3\xb3d\xc5\xba, Poland'", column3

Since it is already a string, I am unable to use .decode('utf-8') and it makes me encode to bytes again ...which creates a double encoding. While by itself:

b'\xc5\x81\xc3\xb3d\xc5\xba, Poland'.decode('utf-8') 

works in Jupyter notebook to get the correct result:

'Łódź, Poland'

When trying with:

column3.encode('utf-8').decode('utf-8') 

the result is:

"b'Å\x81ódź, Poland'"

How can I correctly decode this half bytes / half utf-8 string? Splitting and replacing the b' and " doesn't seem to work.


Viewing all articles
Browse latest Browse all 1047

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>