Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1200

Convert Unicode to ASCII without errors in Python

$
0
0

My code just scrapes a web page, then converts it to Unicode.

html = urllib.urlopen(link).read()html.encode("utf8","ignore")self.response.out.write(html)

But I get a UnicodeDecodeError:


Traceback (most recent call last):  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__    handler.get(*groups)  File "/Users/greg/clounce/main.py", line 55, in get    html.encode("utf8","ignore")UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2818: ordinal not in range(128)

I assume that means the HTML contains some wrongly-formed attempt at Unicode somewhere. Can I just drop whatever code bytes are causing the problem instead of getting an error?


Viewing all articles
Browse latest Browse all 1200

Trending Articles