Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

meltano handling of encoding

$
0
0

I am currently pulling my hair out trying to get meltano to correctly write my data to a target.

Step 1:

The data is extracted correctly using tap-mssql (database has collation SQL_Latin1_General_CP1_CI_AS), meltano displays the records in the console, eg:

Message: { ....., Text: "Емуподушкипоправлять, \\nПечальноподноситьлекарство", Text2: "मेरेचाचासबसेईमानदारनियम" ...}

So far so good, python correctly handles the UTF8? data.

Step 2:

When meltano tries to write it to target (eg. target-jsonl, target-csv, any...) the problems start.

I really tried every conversion possible, but i cannot fix it.

# tried combining all different encodingsstr(record['Text']).encode('utf8').decode('utf8')

Anybody seen this? Is meltano broken?

I always get something along the lines of:

"UnicodeEncodeError: 'utf-8' codec can't encode character '\\udc8f' in position 20: surrogates not allowed"

Why surrogates suddenly?I really dont understand why this is such a nightmare? I am reading UTF8 and passing those strings around. Why is this happening?


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>