Whatever environments jumped on Unicode early, before it was realized that 2 bytes wouldn't be enough, all chose to use UCS-2 for obvious reasons. In particular, that includes Windows and Java.
Probably because they figured they could just ignore endianness issues and that ASCII compatibility would be Somebody Else's Problem.
There were always problems with UCS-2. UTF-8 would have had a number of advantages over it even if Unicode had never grown beyond the BMP (Basic Multilingual Plane, the first and lowest-numbered 16-bit code space).
> ASCII compatibility would be Somebody Else's Problem
for many of those outside "A" in ASCII (euphemism for America :) there were already a ton of problems, so endianness was the least (i personally never hit this problem)
// disclaimer: i'm not that serious about predominance of Latin script, this is sorta irony
Depending on the level of abstraction you're living at - and that depends on the overall goal, performance constraints, environmental integration, OS / machine heterogeneity etc. - it may or may not be a problem.
It's easy to dismiss if you have all the time in the world and a deep stack of abstractions.
If you're doing deep packet analysis on UTF-16 text in a router, things may be different.
thanks, my question was right about the issues met by people living in another levels of abstractions.
i'm not a native english speaker and a newb to HN, so sorry that i put my sincere question so that it looked like arrogant statement 'there are no issues, what are you talking about, i even don't know what LE and BE mean'.
> or many of those outside "A" in ASCII (euphemism for America :)
Abbreviation for 'American', in fact. No euphemisms needed.
(ASCII = American Standard Code for Information Interchange)
> there were already a ton of problems, so endianness was the least
I can appreciate this. However, UTF-8 also has desirable properties like 'dropping a single byte only means you lose one character, as opposed to potentially losing the whole file', and 'you can often tell if a multi-byte UTF-8 sequence has been corrupted without doing complex analysis'.
> i'm not that serious about predominance of Latin script, this is sorta irony
Heh. ASCII can't even encode the entirety of the Latin script: Ask a Frenchman how he spells 'café', or a German how he spells 'straße', and notice how important characters are missing from ASCII.