Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1045

Unicode character ſ is matched as itself and as 's.'

$
0
0

I just tried to clean up an old German text containing the character 'ſ' (U+017F). I wanted to replace it with 's', but when I used :%s/ſ/s/g not only that character got replaced but also all occurrences of 's' followed by an arbitrary character, as if I had used the command :%s/s./s/g.

As an example, the text:

Die Gleichheit **) fordert das Nachdenken heraus durch Fragen, die ſich daran knüpfen und nicht ganz leicht zu beantworten ſind.

will be replaced by my command with:

Die Gleichheit **) fordert dasNachdenken herausdurch Fragen, die sich daran knüpfen und nicht ganz leicht zu beantworten sind.

I assume it might have something to do with the fact that 'ſ' is represented in UTF-8 as a sequence of two bytes (0xC5 0xBF). Isn't that a bug? If not, is there a way to just replace 'ſ' and not also 's'?

I am using fileencoding=utf-8 and:

> vim --versionVIM - Vi IMproved 9.1 (2024 Jan 02)Included patches: 1-151> echo $LANGde_DE.UTF-8

Here is a screenshot with :set hlsearch:

Search highlighting reveals the matching


UPDATE: I got hold of an installation of vim version 8.0 patches up to 586 on Windows 10, exhibiting the same behavior with both my and the \%u versions of the command.


Viewing all articles
Browse latest Browse all 1045

Trending Articles