Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1038

Can a multibyte chacteracter be used as a regex pattern delimiter in a PCRE environment like PHP?

$
0
0

For a long time, any time I've needed to use a regular expression, I've standardized on using the copyright symbol © as the delimiter because it was a symbol that wasn't on the keyboard that I was sure to not use in a regular expression, unlike ! @ # \ or / (which are sometimes all in use within in a regex).

Code:

$result=preg_match('©<.*?>©', '<something string>');

However, today I needed to use a regular expression with accented characters which included this:

Code:

[a-zA-ZàáâäãåąćęèéêëìíîïłńòóôöõøùúûüÿýżźñçčšžÀÁÂÄÃÅĄĆĘÈÉÊËÌÍÎÏŁŃÒÓÔÖÕØÙÚÛÜŸÝŻŹÑßÇŒÆČŠŽ∂ð \,\.\'-]+

After including this new regex in the PHP file in my IDE (Eclipse PDT), I was prompted to save the PHP file as UTF-8 instead of the default cp1252.

After saving and running the PHP file, every time I used a regex in a preg_match() or preg_replace() function call, it generated a generic PHP warning (Warning: preg_match in file.php on line x) and the regex was not processed.

So--two questions:

1) Is there another symbol that would be good to use as a delimiter that isn't typically found on a keyboard (`~!@#$%^&*()+=[]{};\':",./<>?|\) that I can standardize on and not worry about having to check each and every regex to see if that symbol is actually used somewhere in the expression?

2) Or, is there a I way I can use the copyright symbol as the standard delimiter when the file format is UTF-8?


Viewing all articles
Browse latest Browse all 1038

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>