Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1064

iconv: illegal input sequence at position

$
0
0

I have a bash script which downloads some files from a url and stores them into a folder named "data1". Since these files are downloaded as .zip then the next step is to unzip them. After that, the variables extension and encoding are get from every file, where extension is the type of file (txt, csv, docx) and encoding is the encoding format of each file (ISO, utf-8). Since the files that this script downloads are not in utf-8 format i have to perform this transformation. This is the line which performs the encoding:

iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;

As you can see, I have to pass two parameters, the file to be encoded to utf-8 format and the name of the output file which will be: conversion_(name of the original file).(extension of the original file). However, I'm getting the following error:

iconv: illegal input sequence at position 1234704

This error is affecting the datos_abiertos_covid19.zip file which after the unzipping process is named as 200715COVID19MEXICO.csv (but it changes depending on the day this script is run). Does anyone knows how I can avoid this error? I specifically need all of the files downloaded to be in utf-8 format. I would really appreciate your help.

Here is the script I'm using:

! /usr/bin/bash# creating foldersmkdir data1cd data1# downloading datawget http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zipwget http://187.191.75.115/gobmx/salud/datos_abiertos/diccionario_datos_covid19.zip# unziping datafor i in `ls | grep .zip`; do unzip $i; done# this for will iterate over all the files contained on the data1 folderfor name in `ls -F -1 | grep -v / | grep -v zip`; do         # getting extension of current file        extension=`echo $name | sed 's/\./ /g' | awk '{print $2}'`;        # getting encoding format of current file        encoding=`file -i $name | sed 's/=/ /g' |awk '{print $4}'`;        # echo $encoding        query="s/\.$extension//g"        # echo $query        name2=`echo $name | sed -e $query`;        # echo $name2        # echo $name" "$extension" "$encoding" "$name2        # encoding current file        iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;donemkdir oldmv `ls | grep -v "conversion_" | grep -v "old"` old

Since this script is intended to be run automatically every 24 hours, then I need the old data (data from a day before) to be stored in another place. That's why at the end of the script a new folder is created and all the "old files" are moved to the folder named "old".


Viewing all articles
Browse latest Browse all 1064

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>