Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Converting all text files with multiple encodings in a directory into a utf-8 encoded text files

$
0
0

I am new starter in Python and, in general, in coding. So any help is greatly appreciated.

I have more than 3000 text files in a single directory with multiple encodings. And I need to convert them into a single encoding (e.g. utf8) for further NLP work. When I checked the type of these files using shell, I identified the following encodings:

Algol 68 source text, ISO-8859 text, with very long linesAlgol 68 source text, Little-endian UTF-16 Unicode text, with very long linesAlgol 68 source text, Non-ISO extended-ASCII text, with very long linesAlgol 68 source text, Non-ISO extended-ASCII text, with very long lines, with LF, NEL line terminatorsASCII textASCII text, with very long linesdatadiff output text, ASCII textISO-8859 text, with very long linesISO-8859 text, with very long lines, with LF, NEL line terminatorsLittle-endian UTF-16 Unicode text, with very long linesNon-ISO extended-ASCII textNon-ISO extended-ASCII text, with very long linesNon-ISO extended-ASCII text, with very long lines, with LF, NEL line terminatorsUTF-8 Unicode (with BOM) text, with CRLF line terminatorsUTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminatorsUTF-8 Unicode text, with very long lines, with CRLF line terminators

Any ideas how to convert text files with the above mentioned encodings into text files with a utf-8 encoding?


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>