I have the following CSV file: Hèllo,"ab - привет"
(echo 'SMOobGxvLCJhYiAtINC/0YDQuNCy0LXRgiINCg==' | base64 --decode > bug.csv
)
Opening it on Windows with Excel app via double-click, it shows the following: Hèllo ab - привет
(but they are correctly placed in two columns).
If I do iconv -f utf-8 -t utf-16le bug.csv > bug_uft16le.csv
and double-click, it shows the following: Hèllo ab - ?@825B
, e.g. it correctly decoded Hèllo, but not the rest. (base64 bug_utf16le.csv
: SADoAGwAbABvACwAIgBhAGIAIAAtACAAPwRABDgEMgQ1BEIEIgANAAoA
).
If I do iconv -f utf-8 -t utf-16 bug.csv > bug_utf16.csv
, Excel correctly shows Hèllo,"ab - привет"
, but it does not recognize that it should be two columns (base64 bug_utf16.csv
: //5IAOgAbABsAG8ALAAiAGEAYgAgAC0AIAA/BEAEOAQyBDUEQgQiAA0ACgA=
). bug_utf16.csv
is exactly the same as bug_utf16le.csv
but only has BOM as the file's first two bytes).
Is there a way to transcode a UTF-8 csv file so that Excel can open it, recognize the columns correctly (and keeping ,
as separator) and show all the French / Cyrillic script correctly?
I found a way sed s@","@"\t"@ bug.csv | iconv -f utf-8 -t utf-16 > bug_utf16_tab.csv
, but I'd much prefer to not mess with replacing the separator (as it's brittle and may break around various quote escaping).
Thanks!
$ iconv --versioniconv (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35Copyright (C) 2022 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.Written by Ulrich Drepper.$ lsb_release -aNo LSB modules are available.Distributor ID: UbuntuDescription: Ubuntu 22.04.3 LTSRelease: 22.04Codename: jammy