Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1223

How to get character position in a text file encode in UTF-8 in C?

$
0
0

The C Standard specifies that ftell() returns the position of a character from the beginning of the file when it's opened in binary mode.

... obtains the current value of the file position indicator for the streampointed to by stream. For a binary stream, the value is the number of characters fromthe beginning of the file. For a text stream, its file position indicator contains unspecifiedinformation, usable by the fseek function for returning the file position indicator for thestream to its position at the time of the ftell call; the difference between two suchreturn values is not necessarily a meaningful measure of the number of characters writtenor read.

If the text file has a wide character, like ñ, then the position of any char after ñ would be greater than the corresponding column in the text file. Just to be specific, what I mean for position here is that the corresponding column if one read the text file as a linear sequence of symbols.

For example, the string " ññññ a ñ a" has 12 char, but printing ftell() inside this loop:

void printPosition(FILE *file){    int c;    long i;    while((c=fgetc(file)) != EOF){        i = ftell(file);        printf("%c %i\n", c, i);    }}

gives the output:

  1├ 2▒ 3  4├ 5▒ 6├ 7▒ 8├ 9▒ 10  11a 12  13├ 14▒ 15  16a 17

I tried opening in text/binary read mode and got the same result for both.


Viewing all articles
Browse latest Browse all 1223

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>