JSON is not a YAML subset

hnlmorg · on May 17, 2022

Unless I’m misunderstanding your argument (which is possible, I’ve not had my morning coffee yet) what you’re describing is really more about JSONs strictness as a feature rather than YAML not being a superset of JSON.

You’re general point is correct, you shouldn’t use a YAML parser to read JSON; but you fail to raise the most important point why. That being the strictness of JSON means if someone submits incorrect JSON then it should fail rather than silently correct it (and in ways that might even corrupt the data).

Of course, there might be some instances where someone opts to use a YAML parser as a lazy way of allowing more lenient JSON syntax but that is a risk the application authors accept and one which should be well documented to their users.

So it feels like you’re making the right conclusion while still arguing the wrong points.

As an aside, it might be worth testing your site on mobile. It doesn’t really render properly for me on my iPhone (Safari). Lots of text out of bounds on the sides and I couldn’t scroll to read them. Plus there were (I think) images of Stack Overflow that looked like part of the article paragraphs and was hard to tell what was reference content and what was your content.

jmillikin · on May 17, 2022

Thanks for the comment (/ reminder) about mobile CSS -- it's on my todo queue, but there's been a lot of stuff popped on top recently. If the iPhone has a "desktop mode" feature, that might make it easier to read for now.

  > That being the strictness of JSON means if someone submits incorrect
  > JSON then it should fail rather than silently correct it (and in ways
  > that might even corrupt the data).

The JSON document {"a": 1e2} is valid JSON. If you put it into a regular JSON parser (for example browser devtools), the correct result will come out:

  >>> JSON.parse('{"a": 1e2}')
  {a: 100}

The issue is that it's also a valid YAML document, and (critically) the correct YAML parse is different than the correct JSON parse.

hnlmorg · on May 17, 2022

Indeed but that doesn’t mean that JSON isn’t a subset of YAML. That means that the strictness of JSON removes ambiguity of what is a string and what is not while the lazy strings in YAML allow for ambiguity (and this problem comes up all the time with proper YAML documents too).

Which is why I said you had reached the right conclusion despite conflating syntax compatibility with syntax strictness.

It’s also worth noting that JSON doesn’t actually have a standard for representing complex numbers. In theory JSON should support scientific notation but given JSONs numeric parser is basically based around what JavaScript supported, it means as JSON was ported to other languages discrepancies have crept in with regards to how numbers should be parsed. Some guides even (incorrectly) state that scientific notation are not supported. And frankly it’s worst advice given how JSON parsers can vary so significantly.

To cloud the issue further, plenty of other “standards” have emerged which are JSON supersets which do describe how numeric data should be handed. Some even support date fields too. Often these parsers are just labelled “JSON” and they can corrupt your data too but developers might think they’re using a proper JSON parser.

So the issues isn’t that YAML doesn’t support JSON. It’s that the JSON spec was basically written on the back of a napkin and doesn’t support any form of versioning (so we can move away from this madness) whilst YAML also supports lazy strings and thus allow data to be ambiguous.

jmillikin · on May 17, 2022

One of us is extraordinarily confused and I hope it isn't me.

  > Indeed but that doesn’t mean that JSON isn’t a subset of YAML. That
  > means that the strictness of JSON removes ambiguity of what is a
  > string and what is not while the lazy strings in YAML allow for ambiguity

I ... don't think that's true. This isn't an issue of "strictness", and the document is unambiguous in both JSON and YAML. There's no question about correctness -- both parses appear to be correct for their respective languages. It's just that JSON and YAML disagree about whether the token `1e2` is an integer or a string.

If LANG-X says that `1e2` is an integer and LANG-Y says it's a string, then axiomatically LANG-X cannot be considered a subset of LANG-Y.

  > It’s also worth noting that JSON doesn’t actually have a standard for
  > representing complex numbers.

I don't see how that matters? `1e2` isn't a complex number, it's an integer.

  > In theory JSON should support scientific notation but [...]

There's no "in theory" and no "but" here. Exponents are a part of the JSON spec, have been since the beginning, and are thoroughly documented in each iteration of the JSON RFC. A parser that can't parse {"a": 1e2} is not a JSON parser.

hnlmorg · on May 17, 2022

> I ... don't think that's true. This isn't an issue of "strictness", and the document is unambiguous in both JSON and YAML. There's no question about correctness -- both parses appear to be correct for their respective languages. It's just that JSON and YAML disagree about whether the token `1e2` is an integer or a string.

My point was that because YAML supports unquoted strings, it means there is ambiguity (ie a lack of strictness) in the values you pass in a YAML document. This is exactly is issue demonstrated with the Norway Problem you referenced. If all strings were quoted when the Norway Problem wouldn't exist. But because YAML allows people to be lazy with quoting it introduces all kinds of edge case bugs that a stricter syntax would eradicate.

> If LANG-X says that `1e2` is an integer and LANG-Y says it's a string, then axiomatically LANG-X cannot be considered a subset of LANG-Y.

You're looking at it far to black and white. YAML does successfully parse JSON in almost all cases. YAML does also support scientific notation for numbers too. The problem here more of a bug than an intended break in compatibility. A bug that originates from its support of unquoted strings. The problem is YAML is a very complicated specification so bugs were inevitable. Thus I don't think it's unreasonable to still claim that YAML is a superset of JSON even in the presence of this bug. Plus this bug was fixed in YAML 1.2.

> I don't see how that matters? `1e2` isn't a complex number, it's an integer.

Sorry for the lack of clarity here. I meant "complex" as in "complicated" rather than the mathematical term "complex". However it's worth noting that you'll more often find scientific numbers used to represent floating point values rather than integers. Even in the case of very large integers, they'll end up being stored as a float in memory and output as an exponent.

> There's no "in theory" and no "but" here. Exponents are a part of the JSON spec, have been since the beginning, and are thoroughly documented in each iteration of the JSON RFC. A parser that can't parse {"a": 1e2} is not a JSON parser.

Yes, that's the "theory". But in practice (and as I'd already said) it's not always implemented correctly because:

1. JSON was based around JavaScript types and thus other languages haven't always implemented things exactly in the same way

2. JSON cannot be versioned so none of the other specifications mean jack shit since you cannot determine which standard a document was marshalled against (apparently this was an intentional decision).

3. There are plenty of other JSON parsers that intentionally parse JSON wrong because they attempt to fix (in their authors mind) some shortcomings of JSON.

4. JSON cannot be versioned so again here, you have no idea if one JSON parser is compliant and another is silently corrupting your data.

YAML might have made a number of design errors but JSON gets a lot wrong too.

soraminazuki · on May 17, 2022

I find this claim misleading.

First, the article starts off by picking examples from YAML 1.1. But the YAML 1.2 spec officially made YAML a superset of JSON. It was published in 2009, more than a decade ago.

The article later does mention YAML 1.2, but it claims that it isn't compatible with JSON either because YAML 1.2 documents are required to have a %YAML directive stating its version. Except, that's not what the spec says. The actual spec states that documents without a %YAML directive should be treated as YAML 1.2 [1].

[1]: https://yaml.org/spec/1.2.1/#id2781553

jmillikin · on May 17, 2022

That part of the YAML 1.2 spec is in conflict with reality, though. The base of YAML 1.1 documents is large enough that a backwards-incompatible change to default behavior is for practical purposes impossible.

YAML 1.1 was released in 2005, and 1.2 in 2009 -- only four years later. But here we are, in 2022, and YAML 1.1 is still the default (in many cases, only) version supported. That's why the "Norway problem" persists -- it's not possible for the parser to know whether the author of an un-versioned YAML document containing "a: no" intended it to parse the same as {"a": false} or {"a": "no"}.

Python (PyYAML) doesn't support 1.2 yet: https://github.com/yaml/pyyaml/issues/116

Ruby (Psych) ditto -- I can't even find a tracking issue to enable it.

Go (go-yaml) is a mixture of YAML 1.1 and 1.2, depending on the author's preferences.

Also, as a rough guideline, you can't have a backwards-incompatible revision of a versioned spec declare that it's the new default version, because that breaks all existing users.