It's odd how neither this post, nor the spec, nor GitHub's "Mastering Markdown" help page[1], nor the more complete "Basic writing and formatting syntax" page[2], mentions the fact that GitHub treats every newline as a hard break.
CommonMark contains this little sentence to work around its specified behavior, which is left untouched in the GFM spec:
> A renderer may also provide an option to render soft line breaks as hard line breaks.
I'd say whether or not it does this is a rather important thing to mention. When I write Markdown documents for GitHub I have to change my editor settings, only because of this.
GFM itself leaves that line unchanged because we don't actually change that option in our implementation — the reference implementation `cmark` (which we built upon) supplies the "hardbreaks" option, described as follows:
> --hardbreaks Treat newlines as hard line breaks
We turn this option on when rendering issues, issue comments and so on, but leave it off when rendering blobs (such as README.md). Both are GFM, one just uses this option to make it more conducive to communication.
> There is a fundamental difference between these two kinds of content: the user comments are stored in our databases, which means their Markdown syntax can be normalized (e.g. by adding or removing whitespace, fixing the indentation, or inserting missing Markdown specifiers until they render properly). The Markdown documents stored in Git repositories, however, cannot be touched at all, as their contents are hashed as part of Git’s storage model.
In general people have come to expect that hitting return once in a comment field on GitHub will produce a newline, having been the case for many years, so we try to preserve that expectation. Hence not changing the option being used when rendering comments.
Conversely, they don't expect the same from Markdown files stored in their repository (e.g. I put each sentence in a paragraph on its own line for my blog, for easier diffing and editing). Additionally, we couldn't normalise these documents even if we wanted (to prevent everything breaking by being over-vertically spaced). Hence not changing the option being not used in this case!
> It's odd how neither this post, nor the spec, nor GitHub's "Mastering Markdown" help page[1], nor the more complete "Basic writing and formatting syntax" page[2], mentions the fact that GitHub treats every newline as a hard break.
Well, actually it does say that "Hard line breaks are for separating inline content within a block."
GitHub devs & Markdown enthusiasts at large, please consider contributing some brainpower to these last remaining issues that are blocking the v1.0 release of CommonMark:
GitHub's spec here is based on [CommonMark][1], which has been around for a while now, and [was originally authored][2] by a group of representatives from GitHub, Reddit, and Stack Exchange.
The first two. [The GFM spec][1] is literally just CommonMark with a few extra extensions added. They even highlighted the new sections green in the spec to make it clear where the GFM spec differs from CommonMark. Everything else is word-for-word identical.
Some quick reading of the linked article says this spec provides a few optional, superset features on top of CommonMark, that it did not contain (like tables, etc).
There was also Common Mark (http://commonmark.org/), which failed IMHO mostly due to John Gruber taking offense at their first choice of name, Common Markdown. Will formalising this as 'GitHub Flavored Markdown' similarly cause offense?
You have to go look at the source code of Grubers implementation to figure out what markdown actually is. Or do some empirical studies with different inputs. That is what I mean by underspecified. His specification is not detailed enough to implement a markdown parser. So in reality it is abondonware.
Did it 'fail'? It seems to have a decent amount of use. And it fixed a bunch of problems with the original Markdown spec (or lack thereof).
I suppose since one of their goals was to get Github and StackExchange 'out of the markdown business' but neither use CommonMark, and further, Github now has put work into creating their own spec, they failed in that aspect.
> one of their goals was to get Github and StackExchange 'out of the markdown business' but neither use CommonMark
FWIW, StackExchange [uses CommonMark][1] for their new StackOverflow Documentation site and has [been planning][2] to migrate Q&A to CommonMark for some time now.
Who knows, but it'd be pretty irrational if it did. Whether it was fair or not, the "Common Markdown" name caused a kerfluffle because of the implication that it was claiming to be the One True Markdown. The name "GitHub Flavored Markdown" only implies that it's base Markdown plus GitHub extras, plus it's been known by that name for a long time now.
It's a specification based on the commonmark specification. Both are not a formal specs. They are more of an informal specification with some edge-cases listed (in contrast to the original markdown specification which has known unspecified edgecases).
I really wish their was concise formal spec for markdown, rather than a multi-page essay. It makes it incredibly difficult for anyone trying to create something to parse it. There is no mechanical way for go from spec -> parser.
Is there any common spec where you can mechanically go from spec to parser? HTTP, SMTP, DNS, HTML, CSS, Javascript, Ruby, Python, C, ... I basically know of nothing in widespread use with a spec that can actually be converted directly into working code.
Is it really useful to write a formal spec for Github Markdown? The software they use to parse and render it is open source. If you want to know how exactly something works, you can read the source.
Is it really useful to write a formal spec for HTML? The software we use to parse and render it is open source. If you want to know how exactly something works, you can read the source.
Having used and maintained a Swift translation of the StackOverflow .NET markdown processor, please, when there is a proposal to use source as a spec, burn it with fire. Scatter the ashes.
Reading the sources and trying to understand it takes a lot longer than looking at bnf and cranking out recursive descent parser based on it or a parser generator.
It is not, and not just because of a few idiosyncrasies like C which require context.
Fundamentally, markdown was specified as some pattern matching and English description. The original specification was not done thinking of productions and grammar rules, and you typically don't get there by accident.
I have sometimes pondered how to make a markdown-like language with a simple production based grammar. I have not succeeded and would appreciate any pointers. The criteria being that is has to have something like the minimal intrusion into the prose of markdown.
If GitHub is normalizing comments anyways, I wonder they could have adopted a CFG.
Overall this a step in the right direction but the whole saga is a perfect microcosm of our understructure cranking out pooly-understood stuff which comes back to bite us and cannot be tamed.
Hi there! As the spec explains, this is a Markdown specific blacklist that prevent the tags that would otherwise "break" the content of the Markdown document.
A document that contains these tags will not be parsed properly by an HTML5 compliant parser; the parser will "swallow" other chunks of Markdown content that come after the tags. Hence, we disable the tags altogether.
This is an UX feature, not a security feature. XSS prevention, and a plethora of other security checks, are performed by our user content stack -- but this functionality is shared for all markup languages in GitHub (MD, RST, ASCIIDOC, ...), so it's not discussed in this spec.
If it disallowed certain words (i.e. treated them specially instead of just reproducing the text as written) such as "</plaintext>" it wouldn't be plain text.
This is great -- lack of a standard that was actually used (unlike CommonMark) was one of my main issues with Markdown (http://ericholscher.com/blog/2016/mar/15/dont-use-markdown-f...) -- It's really great to see GitHub leading in this department, and it gives me hope that one day we might actually have Markdown that is portable between implementations.
At the risk of starting a mini-flame war, is RST a more cohesive format? If one was to pick one of the two formats to start using for personal documentation, which format should one choose?
I do think rST is waaay more expressive, but I also recognize that in many of the instances one would want to use markup in a chat or PR situation, the expressiveness likely wouldn't be well received if the trade-off is verbosity.
This is something in life that I file away with competing regex standards: my brain just has to switch languages based on the app in which I'm typing (between markdown, pseudo-markdown (ahem, Slack), org-mode, rST, etc).
For personal documentation (assuming you mean notes): follow your heart. Personally, though, I wouldn't use RST for any non-python public documentation at this point, but for my personal notes, it's hard to beat the extensibility of RST.
HTML used to serve as a simple way to format a document. Now HTML is too complex for that purpose. Introduce markdown. In ten years, markdown will be too complex for formatting documents.
I am big of markdown in case I didn't make that clear. I love it
It wasn't an issue of simplicity, it was about HTML being extremely hard to sanitize so it isn't turing complete. Having formatting (that can still line up with your sites styles) while knowing you can't get script injection is pretty useful.
I don't think it's that HTML is too complex. It's more that Markdown allows documents to be marked up in a way that "natural" and easy to read if you're reading the plain text version of it.
This is great news! Does anybody have a recommendation for a Javascript parser for Formal GFM? (I know there are a million JS MD parsers; I'm looking for a good one that will let me serve GFM docs over HTTP and render them on the browser.)
I'm really happy to see this. It's actually quite frustrating that although markdown is so nice, it barely has a consistent standard. It's almost impossible to use it cross-service.
Hopefully now that Github has standardised their own flavour of it (and quite a nice flavour too), more people will start to use it.
Is it ok if I promote my domain here? I read the guidelines but it doesn't mention anything regarding self promotion.
Sorry if it ain't appropriate but anyone looking for a relevant domain (markdown.in) please get in touch or any suggestion if it is better to develop it.
CommonMark contains this little sentence to work around its specified behavior, which is left untouched in the GFM spec:
> A renderer may also provide an option to render soft line breaks as hard line breaks.
I'd say whether or not it does this is a rather important thing to mention. When I write Markdown documents for GitHub I have to change my editor settings, only because of this.
[1]: https://guides.github.com/features/mastering-markdown/
[2]: https://help.github.com/articles/basic-writing-and-formattin...