Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1135

How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python

$
0
0

I scrawled down the data and had to save the dataframe as utf-16 (Unicode) since the Latin/Spanish words were shown weird in the form of utf-8. I used the following code to save the dataframe:

 df.to_csv("blogdata.csv", encoding = "utf-16", sep = "\t", index = False)

when I try to read the file to clean the data using the following code:

 blogdata = pd.read_csv('c:/Users/hyoungm?Downloads/blogdata.csv')

it shows the following error.

---------------------------------------------------------------------------UnicodeDecodeError                        Traceback (most recent call last)<ipython-input-2-15ec18f92889> in <module>()----> 1 blogdata = pd.read_csv('C:/Users/hyoungm/Downloads/blogdata.csv')...pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Please see my screenshot here:

enter image description here

I don't know either how to save the original data without losing those Laint/Spanish words within English sentences or how to read Unicode data file. Can anybody please help me with solving this issue?


Viewing all articles
Browse latest Browse all 1135

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>