Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1148

PHP multibyte regex not working with UTF-8 [duplicate]

$
0
0

I have UTF-8 string that I want to search for all occurrences of img_(\d+).I have tried original

$pattern = '/img_(\d+)/u';preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE);

but it gives me wrong offsets for the patterns.

I have also tried:

mb_internal_encoding('UTF-8');$pattern = 'img_(\d+)';mb_ereg_search_init($content, $pattern);$matches = [];        while ($result = mb_ereg_search_regs()) {    $matches[] = ['match' => $result[0],'offset' => mb_ereg_search_getpos() - mb_strlen($result[0]),    ];}

but it gives me the same result as preg_match_all.

However, when I run manually search with this:

$pos = mb_strpos($content, "img_1", 0);

I got correct offset.

Example code:

$str = "přílišžluťoučký img_1 kůn úpěl ďábelskéódy";$pattern = '/img_(\d+)/u';preg_match_all($pattern, $str, $matches, PREG_OFFSET_CAPTURE);print_r($matches); //gives 24 (wrong)echo mb_strpos($str, "img_1", 0); //gives 17 (correct)

How to fix this?


Viewing all articles
Browse latest Browse all 1148

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>