python2.7 - How to decode JSON without decoding the UTF-8 inside of it?

I need a function to decode UTF-8 encoded JSON. This function should take a UTF-8 encoded JSON string and convert it to UTF-8 encoded objects. The following code works:

# helper functiondef Obj_To_UTF8(o):    res = {}    for k, v in o.items():        k = k.encode('utf-8')        v = v.encode('utf-8') if isinstance(v, unicode) else v        res[k] = v    return res# Load UTF-8 encoded JSON to UTF-8 encoded objectsdef Load_JSON(s):    return json.loads(s, object_hook = Obj_To_UTF8)

but is rather ridiculous, as json.loads is decoding the UTF-8 just so we can encode it again. How can I decode JSON without decoding the UTF-8 inside of it?

Background

I emphasize that the JSON is already UTF-8 encoded. This is important because I want to process the decoded JSON in exactly the same character encoding as the encoded JSON, using the same byte-based functions I use to process files, sockets, etc. Therefore, for best modularity, we should not bring unicodes into the program. (If we were changing the encoding or writing a text editor or font renderer, that would be the place to bring in unicodes. If we're not doing those things, then unicodes and the associated encoding/decoding are just a headache. Hopefully the example above illustrates this pretty well.)

For those unfamiliar with UTF-8, it is designed to support matching and tokenization directly on the bytes, specifically so that it is compatible with encoding-agnostic software (some of which is over 50 years old and still kicking). Languages like C and Python2 make it easy to process strings directly in bytes, and therefore avoid constantly encoding and decoding strings.

Unfortunately, json.loads seems to stray from these principles, for no obvious good reason. I can think of only two reasons why a JSON implementation would even care about character encoding:

The \u feature, which is optional (and turned off by default in web browsers, so that they tend to emit UTF-8 sequences for non-ASCII characters).
Validating that the received string is proper UTF-8. Hopefully this pedantry would also be optional. (Apart from the UTF-8 fiat, JSON's actual delimiting scheme works perfectly well with binary data.)

So maybe the JSON implementation cares about the character encoding, but forcing all JSON applications to deal with character encoding is poor modularity: rather than separating concerns, it unnecessarily ties JSON decoding to UTF-8 decoding. So a decent JSON implementation would at least provide a way to disable the UTF-8 decoding.

python2.7 - How to decode JSON without decoding the UTF-8 inside of it?

Trending Articles

Nalgonda District Police Office Mobile Numbers List in Telangana State

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Teen Shot In Miami Drive-By Dies From Injuries

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Moondru Mudichu 02-03-2017 – Polimer tv Serial

O'CONNELL MICHAEL F. 11/29/197...

[同期の失敗] について

Could Not Find the Application that Created this file

Edna Murto, 90, longtime resident of Ely, dies

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Mp3 Download: Mandoza - Godoba

Arrow Flash 2 – Sinhala Dubbed – Episode 17 – 28th February 2016

Download: Bicko Bicko ft Rich Bizzy & Crew G- Wanfulanganya (Prod by: Bicko...

Arrest logs for Wednesday, March 20, 2019

Bureau of Internal Revenue: Regional Offices (Directory)

SEAGCD2 - Editorial

Not right!

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

EXERCISE