I am new starter in Python and, in general, in coding. So any help is greatly appreciated.
I have more than 3000 text files in a single directory with multiple encodings. And I need to convert them into a single encoding (e.g. utf8) for further NLP work. When I checked the type of these files using shell, I identified the following encodings:
Algol 68 source text, ISO-8859 text, with very long linesAlgol 68 source text, Little-endian UTF-16 Unicode text, with very long linesAlgol 68 source text, Non-ISO extended-ASCII text, with very long linesAlgol 68 source text, Non-ISO extended-ASCII text, with very long lines, with LF, NEL line terminatorsASCII textASCII text, with very long linesdatadiff output text, ASCII textISO-8859 text, with very long linesISO-8859 text, with very long lines, with LF, NEL line terminatorsLittle-endian UTF-16 Unicode text, with very long linesNon-ISO extended-ASCII textNon-ISO extended-ASCII text, with very long linesNon-ISO extended-ASCII text, with very long lines, with LF, NEL line terminatorsUTF-8 Unicode (with BOM) text, with CRLF line terminatorsUTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminatorsUTF-8 Unicode text, with very long lines, with CRLF line terminators
Any ideas how to convert text files with the above mentioned encodings into text files with a utf-8 encoding?