I'm searching for a safe method to escape all non-ASCII characters in a QString
(and of course to un-escape them later) that will result in pure ASCII but yield the shortest possible string.
What I currently do is:
QByteArray excludes = QStringLiteral(" !\"#$&'()*+,-./:;<=>?@[]_{|}~").toUtf8();auto escaped = QString::fromUtf8(someString.toUtf8().toPercentEncoding(excludes));auto unEscaped = QString::fromUtf8(QByteArray::fromPercentEncoding(escaped.toUtf8()));
This is reliable and works perfectly in both directions. But the problem is that the result is quite long. An escaped character takes at least 6 chars:
E.g. ê
is encoded as %C3%AA
, or 😀
would become %F0%9F%98%8A
.
I tried to find a way to make this shorter. E.g. the shortest Base64 representation of C3AA
would be w6o
, for F09F988A
it would be 8J+Yig
. Half the length of the percent encoded version. But I don't know how long the escaped sequence would be, so I would need to add a start and finish character like &w6o;
or such – and then, only one single char would be cheaped out.
So: Is there a better (meaning shorter) way than the said percent encoding to reliably escape and unescape all non-ASCII characters in a QString
?