I am currently pulling my hair out trying to get meltano to correctly write my data to a target.
Step 1:
The data is extracted correctly using tap-mssql (database has collation SQL_Latin1_General_CP1_CI_AS), meltano displays the records in the console, eg:
Message: { ....., Text: "Емуподушкипоправлять, \\nПечальноподноситьлекарство", Text2: "मेरेचाचासबसेईमानदारनियम" ...}
So far so good, python correctly handles the UTF8? data.
Step 2:
When meltano tries to write it to target (eg. target-jsonl, target-csv, any...) the problems start.
I really tried every conversion possible, but i cannot fix it.
# tried combining all different encodingsstr(record['Text']).encode('utf8').decode('utf8')
Anybody seen this? Is meltano broken?
I always get something along the lines of:
"UnicodeEncodeError: 'utf-8' codec can't encode character '\\udc8f' in position 20: surrogates not allowed"
Why surrogates suddenly?I really dont understand why this is such a nightmare? I am reading UTF8 and passing those strings around. Why is this happening?