I have a bash script which downloads some files from a url and stores them into a folder named "data1". Since these files are downloaded as .zip then the next step is to unzip them. After that, the variables extension
and encoding
are get from every file, where extension
is the type of file (txt, csv, docx) and encoding
is the encoding format of each file (ISO, utf-8). Since the files that this script downloads are not in utf-8 format i have to perform this transformation. This is the line which performs the encoding:
iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;
As you can see, I have to pass two parameters, the file to be encoded to utf-8 format and the name of the output file which will be: conversion_(name of the original file).(extension of the original file). However, I'm getting the following error:
iconv: illegal input sequence at position 1234704
This error is affecting the datos_abiertos_covid19.zip file which after the unzipping process is named as 200715COVID19MEXICO.csv (but it changes depending on the day this script is run). Does anyone knows how I can avoid this error? I specifically need all of the files downloaded to be in utf-8 format. I would really appreciate your help.
Here is the script I'm using:
! /usr/bin/bash# creating foldersmkdir data1cd data1# downloading datawget http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zipwget http://187.191.75.115/gobmx/salud/datos_abiertos/diccionario_datos_covid19.zip# unziping datafor i in `ls | grep .zip`; do unzip $i; done# this for will iterate over all the files contained on the data1 folderfor name in `ls -F -1 | grep -v / | grep -v zip`; do # getting extension of current file extension=`echo $name | sed 's/\./ /g' | awk '{print $2}'`; # getting encoding format of current file encoding=`file -i $name | sed 's/=/ /g' |awk '{print $4}'`; # echo $encoding query="s/\.$extension//g" # echo $query name2=`echo $name | sed -e $query`; # echo $name2 # echo $name" "$extension" "$encoding" "$name2 # encoding current file iconv -f $encoding -t UTF-8//TRANSLIT $name2.$extension -o conversion_$name2.$extension;donemkdir oldmv `ls | grep -v "conversion_" | grep -v "old"` old
Since this script is intended to be run automatically every 24 hours, then I need the old data (data from a day before) to be stored in another place. That's why at the end of the script a new folder is created and all the "old files" are moved to the folder named "old".