No, it's not excusable - the MySQL project had a trivial alternative: don't call...

timr · on July 10, 2012

"they lied to end-users, effectively"

Yeah, that's not exaggeration at all. Because it's not as if they very clearly document exactly what they support:

http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html

pepve · on July 10, 2012

Sure, so let's call this button that erases your data "list files", and just very clearly document it. It's not lying if we change the definition.

sedev · on July 10, 2012

Bingo. That's why I used the "effectively" qualifier - if my toolchain says "oh this is UTF-8," then I should be able to trust that it's for-real, honest-to-goodness, spec-compliant UTF-8. If it's "oh this is the part of UTF-8 that was easy to implement " instead, then that tool has lied to me. I shouldn't have to read the documentation to find out that something is not what it claims to be.

Bonus points for the documentation brazenly ignoring that they're implementing something that's not spec-compliant and naming it like it is.

deafbybeheading · on July 11, 2012

Well, to be fair, MySQL has a storied history of implementing 95% of a feature, calling it good enough, and shipping it.

And while, as a Postgres user, my tone here may be a little snide, I also say this with grudging respect: I think there is a point at which implementing n% of a feature X and calling it X (rather than MaybeX or MostlyX) does give you some momentum and practical compatibility that you wouldn't have otherwise. Is it dishonest to hide the limitations regarding the edge cases in some documentation no one will read? Maybe. But will providing the feature solve more problems than it causes? Quite possibly.

I don't agree with MySQL's decision with respect to UTF-8, but I do understand it.

sedev · on July 11, 2012

That's an important piece of context, thank you for pointing it out. Engineering decisions occur in a cultural context of mere humans making decisions, and we do well to remember that.

sorbits · on July 11, 2012

don't call it UTF8!

While I don’t know the history of MySQL, it seems to me that when they implemented it, their implementation was indeed in compliance with the standard (Unicode 3).

The standard has since grown from 16 to 32 bit code points.

Why MySQL had to introduce a new name for the UTF-8 encoded tables that can contain 32 bit code points is strange, but I assume there is a technical explanation (probably having to do with binary compatibility with existing tables / MySQL drivers or similar).