I'm trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ç, ñ, etc.). But it breaks when I try to write out the accents as ASCII.
df = pd.read_csv('filename.txt',sep='|', encoding='utf-8')<do stuff>newdf.to_csv('output.txt', sep='|', index=False, encoding='ascii')------- File "<ipython-input-63-ae528ab37b8f>", line 21, in <module> newdf.to_csv(filename,sep='|',index=False, encoding='ascii') File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1344, in to_csv formatter.save() File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1551, in save self._save() File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1652, in _save self._save_chunk(start_i, end_i) File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1678, in _save_chunk lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer) File "pandas\lib.pyx", line 1075, in pandas.lib.write_csv_rows (pandas\lib.c:19767)UnicodeEncodeError: 'ascii' codec can't encode character '\xb4' in position 7: ordinal not in range(128)
If I change to_csv to have utf-8 encoding, then I can't read the file in properly:
newdf.to_csv('output.txt',sep='|',index=False,encoding='utf-8')pd.read_csv('output.txt', sep='|')> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte
My goal is to have a pipe-delimited file that retains the accents and special characters.
Also, is there an easy way to figure out which line read_csv is breaking on? Right now I don't know how to get it to show me the bad character(s).