I try to read and print the following file: txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)
According to the SEC the data set is provided in a single encoding, as follows:
Tab Delimited Value (.txt): utf-8, tab-delimited, \n- terminated lines, with the first line containing the field names in lowercase.
My current code:
import csvwith open('txt.tsv') as tsvfile: reader = csv.DictReader(tsvfile, dialect='excel-tab') for row in reader: print(row)
All attempts ended with the following error message:
'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
I am a bit lost. Can anyone help me?