Compressing UTF-8(or other 8-bit encoding) to 7 or fewer bits
I wish to take a file encoded in UTF-8 that doesn't use more than 128 different characters, then move it to a 7-bit encoding to save the 1/8 of space. For example, if I have a 16 MB text file that only...
View ArticleHow to easily detect utf8 encoding in the string?
I have string which fill up by data from other program and this data can be with UTF8 encoding or not. So if not i can encode to UTF8 but what is the best way to detect UTF8 in the C++? I saw this...
View ArticleDetermine NLS_LANG on linux [closed]
How do I determine the NLS_LANG setting for my Oracle Client on linux?I haven't set the NLS_LANG explicitly.Is it necessary to set and export the variable NLS_LANG=AMERICAN_AMERICA.AL32UTF8 for...
View ArticleConvert default emojis to custom png
I have chat application, where now I have default browser emojis integrated. Now I want to change it with custom png emojis. So how can I convert it? should I parse each and every chat message and swap...
View ArticleWhy do I receive an InvalidOperationError "file encoding is not UTF-8" when...
The code in the accepted answer of this question is as follows:import polars as pldf1 = pl.DataFrame({"a": [1, 2], "b": [3 ,4]})df2 = pl.DataFrame({"a": [5, 6], "b": [7 ,8]})with open("out.csv",...
View ArticleIntelliJ IDEA incorrect encoding in console output
It seems to be really crazy, but I can't do anything with broken encoding in the console of my IntelliJ IDEA.Things I made to overcome this:Set -Dfile.encoding=UTF-8 and -Dfile.encoding=UTF-8 in both...
View ArticleHow do I export an Excel file with Chinese characters to a CSV?
I having a Excel document with a data table containing Chinese characters. I am trying to export this Excel spreadsheet to a CSV file for importing into a MySQL database.However, when I save the Excel...
View ArticleLua unicode, using string.sub() with two-byted chars
As example: I want remove the first 2 letters from the string "ПРИВЕТ" and "HELLO." one of these are containing only two-byted unicode symbols. Trying to use string.sub("ПРИВЕТ") and...
View ArticleHow to uppercase/lowercase UTF-8 characters in C++?
Let's imagine I have a UTF-8 encoded std::string containing the following: óóand I'd like to convert it to the following:ÓÓIdeally I want the uppercase/lowercase approach I'm using to be generic across...
View ArticleGitLab CI Allure Report Shows Garbled Characters in Console Output
I'm experiencing intermittent issues with Allure reports in GitLab CI where the console log occasionally displays garbled/corrupted characters instead of readable output.This doesn't happen...
View ArticleBCP export in UTF-8
I'm trying to export data from SQL Server using BCP utility on cmd into a .txt file. My requirement is that the exported .txt file needs to be in UTF-8 encoding, but whatever I do it always comes out...
View ArticleIs content-type "text/xml; charset=utf-8" wrong?
I am making an HttpRequest and I am specifying Content-Type as follows but my code review by Senior Developers gets rejected.val request = RequestBuilder.post .setUri(metaData("serviceUri"))...
View ArticlePickle encoding utf-8 issue
I'm trying to pickle a pandas dataframe to my local directory so I can work on it in another jupyter notebook. The write appears to go successful at first but when trying to read it in a new jupyter...
View ArticleC++ Unicode Problems/Questions
I wrote two versions of a little program in C++ with MSVC on Windows 11:First one:#include <iostream>#include <Windows.h>int main(){ SetConsoleOutputCP(CP_UTF8); std::cout << u8"äüö...
View ArticleHow to fix ANSII character in SQL Server table to UTF-8
I have a data import process to import data from csv file into a table in SQL server.I have noticed that some columns contain some accented characters.For example I have noticed the following text in...
View ArticleAnything wrong with using windows-1252 instead of UTF-8
I have a test site that has been using windows-1252 all along. They do need/use some symbols like the square root symbol. And they have no need to display in another language other than English. I was...
View ArticleForce all character columns in a list of data frames to UTF-8 before...
I have a list of two tables in R.Each data frame contains several character and numeric columns. One of the columns is a company name column (for example, Company_Name).The target database only...
View ArticleHow to decode a single UTF-8 character and step onto the next using only the...
Does Rust provide a way to decode a single character (unicode-scalar-value to be exact) from a &[u8], which may be multiple bytes, returning a single USV?Something like GLib's g_utf8_get_char&...
View ArticleIn DuckDB, can there be proper UTF-8 output in duckbox mode to Windows console?
I cannot get non-ASCII characters to be properly displayed in DuckDB console, even if the console application supports UTF-8. I have a sample CSV file encoded in UTF-8 containing a few test...
View ArticleAre there any dangers to working internally in UTF-8 and then converting to...
Visual studio tries to insist on using tchars, which when compiled with the UNICODE option then basically ends up using the wide versions of the Windows and other API. Is there then any danger to using...
View ArticleWriting accentuated characters doesn't work
I'm having problems figuring out to make mpdf work with accents.The following code always break, even with a single accentuated character.$mpdf = new Mpdf();$mpdf->WriteHTML(mb_convert_encoding('î',...
View ArticleWhat's the point of UTF-16?
I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still...
View ArticleIs it possible to decode bytes to UTF-8, converting errors to escape...
In Rust it's possible to get UTF-8 from bytes by doing this:if let Ok(s) = str::from_utf8(some_u8_slice) { println!("example {}", s);}This either works or it doesn't, but Python has the ability to...
View ArticleWhy is the /utf-8 flag in MSVC not allowing my program to display Unicode...
I recently discovered that on Windows 10/11 there is a beta testing option under region settings (system locale) to "Use Unicode UTF-8 for worldwide language support". When this is enabled, all the...
View ArticleConversion between NVARCHAR to VARCHAR
I've got an Oracle DB with ALL the character columns defined as NVARCHAR or NCHAR or NCLOB, using charset UTF-16.Now I want to migrate to a new DB that has charset UTF-8. Since it can store unicode...
View ArticleSet charset different from UTF-8 in JSON Response
I have this get request in my controller in ASP.NET Core project[HttpGet][Route("api/controller/getlastresult/{id}")]public IActionResult GetLatestResultForController(string id){ Response.ContentType =...
View ArticleGetting the actual length of a UTF-8 encoded std::string?
My std::string is UTF-8 encoded so obviously, str.length() returns the wrong result.I found this information but I'm not sure how I can use it to do this:The following byte sequences areused to...
View ArticleUltimate way to use UTF-8 in mysql
I have read many articles, discussions and tutorials about using utf-8 charset in mysql. Several approaches are introduced apparently for different cases (e.g. transfering to utf-8). What are the...
View ArticleCharacter encoding of GET request parameter
Hello fellow Stackoverflowers.I have an issue that i need some help with:We're making an http GET web service call from a smartphone app to a Java/Spring MVC application. We're on a Tomcat application...
View ArticleC programming: How can I program for Unicode?
What prerequisites are needed to do strict Unicode programming?Does this imply that my code should not use char types anywhere and that functions need to be used that can deal with wint_t and...
View Article