Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1045

How can I escape all unicode characters in a QString so that only ASCII is used and the result is as short as possible?

$
0
0

I'm searching for a safe method to escape all non-ASCII characters in a QString (and of course to un-escape them later) that will result in pure ASCII but yield the shortest possible string.

What I currently do is:

QByteArray excludes = QStringLiteral(" !\"#$&'()*+,-./:;<=>?@[]_{|}~").toUtf8();auto escaped = QString::fromUtf8(someString.toUtf8().toPercentEncoding(excludes));auto unEscaped = QString::fromUtf8(QByteArray::fromPercentEncoding(escaped.toUtf8()));

This is reliable and works perfectly in both directions. But the problem is that the result is quite long. An escaped character takes at least 6 chars:

E.g. ê is encoded as %C3%AA, or 😀 would become %F0%9F%98%8A.

I tried to find a way to make this shorter. E.g. the shortest Base64 representation of C3AA would be w6o, for F09F988A it would be 8J+Yig. Half the length of the percent encoded version. But I don't know how long the escaped sequence would be, so I would need to add a start and finish character like &w6o; or such – and then, only one single char would be cheaped out.

So: Is there a better (meaning shorter) way than the said percent encoding to reliably escape and unescape all non-ASCII characters in a QString?


Viewing all articles
Browse latest Browse all 1045

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>