My setup: gcc-4.9.2, UTF-8 environment.
The following C-program works in ASCII, but does not in UTF-8.
Create input file:
echo -n 'приветмир'> /tmp/вход
This is test.c:
#include <stdio.h>#include <stdlib.h>#include <string.h>#define SIZE 10int main(void){ char buf[SIZE+1]; char *pat = "приветмир"; char str[SIZE+2]; FILE *f1; FILE *f2; f1 = fopen("/tmp/вход","r"); f2 = fopen("/tmp/выход","w"); if (fread(buf, 1, SIZE, f1) > 0) { buf[SIZE] = 0; if (strncmp(buf, pat, SIZE) == 0) { sprintf(str, "% 11s\n", buf); fwrite(str, 1, SIZE+2, f2); } } fclose(f1); fclose(f2); exit(0);}
Check the result:
./test; grep -q 'приветмир' /tmp/выход&& echo OK
What should be done to make UTF-8 code work as if it was ASCII code - not to bother how many bytes a symbol takes, etc. In other words: what to change in the example to treat any UTF-8 symbol as a single unit (that includes argv, STDIN, STDOUT, STDERR, file input, output and the program code)?