Easy way to read UTF-8 characters from a binary file?
Here is my problem: I have to read "binary" files, that is, files which have varying "record" sizes, and which may contain binary data, as well as UTF-8-encoded text fields.Reading a given number of...
View ArticleEncoding all the special characters in Javascript
I have to encode the string that I receive here and pass it as a URL parameter so I don't believe I can pass either / or a paranthesis ( so considering I have the following stringKEY WEST /...
View ArticlePHP DOMDocument loadHTML not encoding UTF-8 correctly
I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me).$profile = "<div><p>various japanese...
View ArticleHow to read text files with ANSI encoding and non-English letters?
I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?Not workingStreamReader...
View Articlemysqldump exporting data in a bad character set
Yesterday for the first time I exported my Mysql database and I found some very strange characters in the dump such as:INSERT INTO `piwik_archive_blob_2013_01` VALUES...
View ArticleFlag Emojis not rendering
I have a dropdown menu on my header that I use to display phone numbers for different countries and I need to put a flag on its side, but no flags are showing, only the letter representation of the...
View ArticleHow to convert a file to utf-8 in Python?
I need to convert some files to UTF-8 in Python, and I have trouble with converting the fileI'd like to do the equivalent of:iconv -t utf-8 $file > converted/$file # this is shell codeThanks!
View Articlec# - How do I convert this UTF16 string to UTF8?
I have situation where I am receiving a string that represents the data read from a UTF16 file. (i don't have control over the incoming data). The string looks...
View ArticleVisual Studio Resource Editor corrupts rc files with UTF-8 encoding
Visual Studio 2019 and 2022 Resource Editor is able to correctly read and display .rc file in UTF-8 encoding if .rc file is saved without UTF-8 BOM.The main requirements for it, .rc file must contain...
View Articleiconv: illegal input sequence at position
I have a bash script which downloads some files from a url and stores them into a folder named "data1". Since these files are downloaded as .zip then the next step is to unzip them. After that, the...
View Articletranslate python code to javascript
I need help to translate this python code to an other language. I would like to convert this python code to javascript:Original code (python) :data = list("\x01\x03\x19 @...
View ArticleIssues printing emojis and symbols on Windows Terminal using Java
I'm using JDK 21, that has the file.encoding automatically to UTF-8 but, even adding it explicitly as a command argument, nothing changes. I checked the property in the code and it is indeed UTF-8.I've...
View ArticlePython - Decode UTF-16 file with BOM
I have a UTF-16LE file with BOM. I'd like to flip this file in to UTF-8 without BOM so I can parse it using Python.The usual code that I use didn't do the trick, it returned unknown characters instead...
View ArticleTruncate a UTF-8 string to fit a given byte count in PHP
Say we have a UTF-8 string $s and we need to shorten it so it can be stored in N bytes. Blindly truncating it to N bytes could mess it up. But decoding it to find the character boundaries is a drag. Is...
View Articlec# - How do I convert this string from a UTF16 file to match a string to UTF8...
I have situation where I am receiving a string that represents the data read from a UTF16 file. (i don't have control over the incoming data). The string looks...
View ArticleWhy does R treat non-ASCII characters differently depending on the SSH...
When run over SSH, R appears to treat non-ASCII characters differently depending on the OS of the SSH client.For example, if I use a computer running macOS (14.6.1) to start an R session on an Ubuntu...
View ArticleWhy is the vocab size of Byte level BPE smaller than Unicode's vocab size?
I recently read GPT2 and the paper says:This would result in a base vocabulary of over 130,000 before any multi-symbol tokens are added. This is prohibitively large compared to the 32,000 to 64,000...
View Articleincompatible character encodings: UTF-8 and ASCII-8BIT in render action
ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT): app/controllers/posts_controller.rb:27:in `new' # GET /posts/new def new if params[:post] @post =...
View ArticlePostgreSQL encoding problem with greek characters
I have made a database with utf8 encoding and some greek characters are not recognized in the values that I insert. I use this to create the database:CREATE DATABASE greek WITH ENCODING 'UTF-8'...
View ArticleHow to enforce ASCII-only identifiers in Python while allowing UTF-8 strings?
I want to configure Python so that it raises an error when encountering non-ASCII characters in identifiers (e.g., variable names, function names) but still accepts UTF-8 encoded strings (e.g.,...
View Article