Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1027

Python 3.12 Write Chinese in Excel CSV - UTF-8-SIG not work

$
0
0

I am using Python 3.12.1 and upload it to AWS Lambda.

What I am doing is to get data from a MySQL DB (with some Chinese text in it) and export to Excel CSV.

Here is the code:

# Copied from https://gist.github.com/tobywf/3773a7dc896f780c2216c8f8afbe62fc#file-unicode-csv-excel-pywith open(self.full_csv_path, 'w', encoding='utf-8-sig', newline='') as fp:    writer = csv.writer(fp)    writer.writerow(['Row', 'Emoji'])    for i, emoji in enumerate(['🎅', '🤔', '😎']):        writer.writerow([str(i), emoji])

Result in (I use Excel: Data > From text to import, not double click)

enter image description here

This also did not work:

with open(self.full_csv_path, 'w', encoding='utf-8-sig') as csvfile:    # Did not work    csvfile.write("許蓋功")    # Did not work, also tried 'utf-8'    csvfile.write("許蓋功".encode('utf-8-sig').decode('utf-8-sig'))

Tried this, not working as well

# Write CSV BOM markcsvfile.write('\ufeff')  # did not workcsvfile.write(u'\ufeff')  # did not workcsvfile.write(u'\ufeff'.encode('utf8').decode("utf8"))  # did not work

It will prepend the above text to the excel file, not BOM mark

It seems very clearly that the string is treated as UTF-8 encoded, but for some unknown and weird reason, it fails to convert to correct UTF-8.

Can you all please help?

Thank you very much.

EDITWhat I want to do is to attach this CSV file that contains Chinese character to an email and send it out in AWS Lambda.

Here is the code to send out email via SES:

        # Create a multipart/alternative child container.        msg_body = MIMEMultipart('alternative')        # Encode the text and HTML content and set the character encoding. This step is        # necessary if you're sending a message with characters outside the ASCII range.        textpart = MIMEText(BODY_TEXT.encode(CHARSET), 'plain', CHARSET)        htmlpart = MIMEText(BODY_HTML.encode(CHARSET), 'html', CHARSET)        # Add the text and HTML parts to the child container.        msg_body.attach(textpart)        msg_body.attach(htmlpart)        # Define the attachment part and encode it using MIMEApplication.        att = MIMEApplication(open(ATTACHMENT, 'r', encoding='utf-8').read())        # Add a header to tell the email client to treat this part as an attachment,        # and to give the attachment a name.        att.add_header('Content-Disposition','attachment',filename=os.path.basename(ATTACHMENT))        # Attach the multipart/alternative child container to the multipart/mixed        # parent container.        msg.attach(msg_body)        # Add the attachment to the parent container.        msg.attach(att)        # print(msg)        response = ''        try:            #Provide the contents of the email.            response = client.send_raw_email(                Source=SENDER,                # Destinations=[ RECIPIENT ],                Destinations=RECIPIENT,                RawMessage={'Data':msg.as_string(),                }            )        # Display an error if something goes wrong.        except ClientError as e:            print(e.response['Error']['Message'])        else:            print("Email sent! Message ID:"),            print(response['MessageId'])            print(f'Attachment: {ATTACHMENT}')

I am thinking about this line:

RawMessage={'Data':msg.as_string(),}

It may be the cause of all this mess. But I have no idea how it works.


Viewing all articles
Browse latest Browse all 1027

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>