Say we have a UTF-8 string $s
and we need to shorten it so it can be stored in N bytes. Blindly truncating it to N bytes could mess it up. But decoding it to find the character boundaries is a drag. Is there a tidy way?
[Edit 20100414] In addition to S.Mark’s answer: mb_strcut()
, I recently found another function to do the job: grapheme_extract($s, $n, GRAPHEME_EXTR_MAXBYTES);
from the intl extension. Since intl is an ICU wrapper, I have a lot of confidence in it.