You are completely right, I'm sorry about my previous comment. The strange thing...

Locke1689 · on Feb 8, 2011

No, that's the correct behavior. list only incidentally returns a single character in ASCII strings -- it's not required to. You shouldn't be using list on raw unicode strings.

  u'\U00010000'.encode('utf-8')

should produce the same result on every Python version.

ot · on Feb 9, 2011

> You shouldn't be using list on raw unicode strings.

Why? I am using list only to show what are the values of s[0] and s[1].

What I am saying is that it returns the list of characters of the underlying representation, so a list of wide chars (possibly surrogate) if compiled with UTF16 or a list of 32bit characters if compiled with UTF16.

Are you suggesting that all the string processing (including iteration) should be done on a str encoded in UTF8 instead of using the native unicode type?