Except that Java strings aren't in Unicode. They're in UTF-16, which is the worst-of-all-worlds encoding. (It's big and heavy, and it still has multibyte sequences. They're just rare enough that you're likely to forget about them during testing.)
Wrong. UTF-16 is an encoding, Unicode is the abstract representation. You can encode Unicode
strings into UTF-16, but that doesn't make UTF-16 Unicode. Python's Unicode strings are actually
really just Unicode, that's why you can't write them to a file or anything - you need to encode
them first (which defaults to UTF-8).
In Java, you end up seeing the guts of UTF-16 far more often than you should. Most notably, the String APIs often index strings by UTF-16 code units, not characters, so string "lengths" don't always correspond to the number of Unicode characters, and you can end up cutting surrogate pairs in half if you aren't careful.
"UCS-2 (2-byte Universal Character Set) is a character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996". Java adopted UCS-2 which was later supplemented with UTF-16 support.
There are at least a few languages in this world which do not use roman letters and are better represented as multibyte sequences :). That is why Java added support for supplementary UTF-16 characters over and above UCS-2. That said, use of UTF-8 would have been optimal for western languages, but sub optimal for several other languages.