Saturday, June 9, 2012

Python: 'utf-8' codec can't decode byte

While scripting python yesterday, I got the following error message in PythonWin: "Failed to run script - syntax error - (unicode error) 'utf-8' codec can't decode byte 0xe8 in position 0: unexpected end of dat" And yes, it actually says "unexpected end of dat", and not "unexpected end of data".

After trying to narrow down the code to find the problem, I ended up with:
string = 'è'

I had the hardest time trying to figure out how it comes that unicode couldn't decode the data. I was pretty sure that unicode could handle most characters, specially an é. This is the sort of problem I would have expected to be about the ascii codec.

After quite some messing around I opened the script in Notepad++, and had a look at the Encoding.

It turns out that the script itself is encoded with ANSI. When I changed to Encode in UTF-8, the 'è' changes to 'xE8', which is exactly what the utf-8 codec had trouble decoding in the script.

With the encoding still set to UTF-8 in Notepad++, I removed xE8, and typed in è and saved the script. The script is now running fine.

No comments: