Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1151

Convert UTF-16 to UTF-8 and remove BOM?

$
0
0

We have a data entry person who encoded in UTF-16 on Windows and would like to have utf-8 and remove the BOM. The utf-8 conversion works but BOM is still there. How would I remove this? This is what I currently have:

batch_3={'src':'/Users/jt/src','dest':'/Users/jt/dest/'}batches=[batch_3]for b in batches:  s_files=os.listdir(b['src'])  for file_name in s_files:    ff_name = os.path.join(b['src'], file_name)      if (os.path.isfile(ff_name) and ff_name.endswith('.json')):      print ff_name      target_file_name=os.path.join(b['dest'], file_name)      BLOCKSIZE = 1048576      with codecs.open(ff_name, "r", "utf-16-le") as source_file:        with codecs.open(target_file_name, "w+", "utf-8") as target_file:          while True:            contents = source_file.read(BLOCKSIZE)            if not contents:              break            target_file.write(contents)

If I hexdump -C I see:

Wed Jan 11$ hexdump -C svy-m-317.json 00000000  ef bb bf 7b 0d 0a 20 20  20 20 22 6e 61 6d 65 22  |...{..    "name"|00000010  3a 22 53 61 76 6f 72 79  20 4d 61 6c 69 62 75 2d  |:"Savory Malibu-|

in the resulting file. How do I remove the BOM?

thx


Viewing all articles
Browse latest Browse all 1151

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>